# Entity Explorer - Linux Host
 <details>
     <summary>&nbsp;<u>Details...</u></summary>

 **Notebook Version:** 1.0<br>
 **Python Version:** Python 3.6 (including Python 3.6 - AzureML)<br>
 **Required Packages**: kqlmagic, msticpy, pandas, pandas_bokeh, numpy, matplotlib, networkx, seaborn, datetime, ipywidgets, ipython, dnspython, ipwhois, folium, maxminddb_geolite2<br>
 **Platforms Supported**:
 - Azure Notebooks Free Compute
 - Azure Notebooks DSVM
 - OS Independent

 **Data Sources Required**:
 - Log Analytics/Azure Sentinel - Syslog, Secuirty Alerts, Auditd, Azure Network Analytics.
 - (Optional) - AlienVault OTX (requires account and API key)
 </details>

This Notebooks brings together a series of tools and techniques to enable threat hunting within the context of a singular Linux host. The notebook utilizes a range of data sources to achieve this but in order to support the widest possible range of scenarios this Notebook prioritizes using common Syslog data. If there is detailed auditd data available for a host you may wish to edit the Notebook to rely primarily on this dataset, as it currently stands auditd is used when available to provide insight not otherwise available via Syslog.

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Notebook-Initialization" data-toc-modified-id="Notebook-Initialization-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Notebook Initialization</a></span><ul class="toc-item"><li><span><a href="#Get-WorkspaceId-and-Authenticate-to-Log-Analytics" data-toc-modified-id="Get-WorkspaceId-and-Authenticate-to-Log-Analytics-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Get WorkspaceId and Authenticate to Log Analytics</a></span></li></ul></li><li><span><a href="#Set-Hunting-Time-Frame" data-toc-modified-id="Set-Hunting-Time-Frame-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Set Hunting Time Frame</a></span><ul class="toc-item"><li><span><a href="#Select-Host-to-Investigate" data-toc-modified-id="Select-Host-to-Investigate-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Select Host to Investigate</a></span></li></ul></li><li><span><a href="#Host-Summary" data-toc-modified-id="Host-Summary-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Host Summary</a></span><ul class="toc-item"><li><span><a href="#Host-Alerts" data-toc-modified-id="Host-Alerts-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Host Alerts</a></span></li></ul></li><li><span><a href="#Re-scope-Hunting-Time-Frame" data-toc-modified-id="Re-scope-Hunting-Time-Frame-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Re-scope Hunting Time Frame</a></span></li><li><span><a href="#How-to-use-this-Notebook" data-toc-modified-id="How-to-use-this-Notebook-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>How to use this Notebook</a></span></li><li><span><a href="#Host-Logon-Events" data-toc-modified-id="Host-Logon-Events-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Host Logon Events</a></span><ul class="toc-item"><li><span><a href="#Logon-Sessions" data-toc-modified-id="Logon-Sessions-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Logon Sessions</a></span><ul class="toc-item"><li><span><a href="#Session-Details" data-toc-modified-id="Session-Details-6.1.1"><span class="toc-item-num">6.1.1&nbsp;&nbsp;</span>Session Details</a></span></li><li><span><a href="#Raw-data-from-user-session" data-toc-modified-id="Raw-data-from-user-session-6.1.2"><span class="toc-item-num">6.1.2&nbsp;&nbsp;</span>Raw data from user session</a></span></li></ul></li><li><span><a href="#Process-Tree-from-session" data-toc-modified-id="Process-Tree-from-session-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Process Tree from session</a></span></li><li><span><a href="#Sudo-Session-Investigation" data-toc-modified-id="Sudo-Session-Investigation-6.3"><span class="toc-item-num">6.3&nbsp;&nbsp;</span>Sudo Session Investigation</a></span></li></ul></li><li><span><a href="#User-Activity" data-toc-modified-id="User-Activity-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>User Activity</a></span></li><li><span><a href="#Application-Activity" data-toc-modified-id="Application-Activity-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Application Activity</a></span><ul class="toc-item"><li><span><a href="#Display-process-tree" data-toc-modified-id="Display-process-tree-8.1"><span class="toc-item-num">8.1&nbsp;&nbsp;</span>Display process tree</a></span></li><li><span><a href="#Application-Logs-with-associated-Threat-Intelligence" data-toc-modified-id="Application-Logs-with-associated-Threat-Intelligence-8.2"><span class="toc-item-num">8.2&nbsp;&nbsp;</span>Application Logs with associated Threat Intelligence</a></span></li></ul></li><li><span><a href="#Network-Activity" data-toc-modified-id="Network-Activity-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Network Activity</a></span><ul class="toc-item"><li><span><a href="#Choose-ASNs/IPs-to-Check-for-Threat-Intel-Reports" data-toc-modified-id="Choose-ASNs/IPs-to-Check-for-Threat-Intel-Reports-9.1"><span class="toc-item-num">9.1&nbsp;&nbsp;</span>Choose ASNs/IPs to Check for Threat Intel Reports</a></span></li></ul></li><li><span><a href="#Configuration" data-toc-modified-id="Configuration-10"><span class="toc-item-num">10&nbsp;&nbsp;</span>Configuration</a></span><ul class="toc-item"><li><span><a href="#msticpyconfig.yaml-configuration-File" data-toc-modified-id="msticpyconfig.yaml-configuration-File-10.1"><span class="toc-item-num">10.1&nbsp;&nbsp;</span><code>msticpyconfig.yaml</code> configuration File</a></span></li></ul></li></ul></div>

# Hunting Hypothesis: 
Our broad initial hunting hypothesis is that a particular Linux host in our environment has been compromised, we will need to hunt from a range of different positions to validate or disprove this hypothesis.


---
### Notebook initialization
The next cell:
- Checks for the correct Python version
- Checks versions and optionally installs required packages
- Imports the required packages into the notebook
- Sets a number of configuration options.

This should complete without errors. If you encounter errors or warnings look at the following two notebooks:
- [TroubleShootingNotebooks](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/TroubleShootingNotebooks.ipynb)
- [ConfiguringNotebookEnvironment](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)

If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:
- [Run TroubleShootingNotebooks](./TroubleShootingNotebooks.ipynb)
- [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)

You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. 
There are more details about this in the `ConfiguringNotebookEnvironment` notebook and in these documents:
- [msticpy configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)
- [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file)

In [None]:
from pathlib import Path
import os
import sys
import warnings
from IPython.display import display, HTML, Markdown

REQ_PYTHON_VER=(3, 6)
REQ_MSTICPY_VER=(0, 5, 0)

display(HTML("<h3>Starting Notebook setup...</h3>"))
if Path("./utils/nb_check.py").is_file():
    from utils.nb_check import check_python_ver, check_mp_ver

    check_python_ver(min_py_ver=REQ_PYTHON_VER)
    try:
        check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)
    except ImportError:
        !pip install --user --upgrade msticpy
        if "msticpy" in sys.modules:
            importlib.reload(msticpy)
        else:
            import msticpy
        check_mp_ver(MSTICPY_REQ_VERSION)
            
from msticpy.nbtools import nbinit
extra_imports = [
    "msticpy.nbtools, observationlist",
    "msticpy.nbtools.foliummap, get_map_center",
    "pyvis.network, Network",
    "re",
    "ipwhois, IPWhois",
    "pandas_bokeh",
    "bokeh.palettes, viridis",
    "dns, reversename",
    "dns, resolver"
]
additional_packages = [
    "oauthlib", "pyvis", "python-whois", "pandas_bokeh"
]
nbinit.init_notebook(
    namespace=globals(),
    additional_packages=additional_packages,
    extra_imports=extra_imports,
);

WIDGET_DEFAULTS = {
    "layout": widgets.Layout(width="95%"),
    "style": {"description_width": "initial"},
}

from msticpy.sectools import auditdextract
from msticpy.sectools.cmd_line import *
from msticpy.sectools.ip_utils import convert_to_ip_entities
from msticpy.sectools.syslog_utils import *
from msticpy.sectools.syslog_utils import create_host_record, cluster_syslog_logons_df, risky_sudo_sessions


### Get WorkspaceId and Authenticate to Log Analytics
 <details>
    <summary>Â <u>Details...</u></summary>
If you are using user/device authentication, run the following cell. 
- Click the 'Copy code to clipboard and authenticate' button.
- This will pop up an Azure Active Directory authentication dialog (in a new tab or browser window). The device code will have been copied to the clipboard. 
- Select the text box and paste (Ctrl-V/Cmd-V) the copied value. 
- You should then be redirected to a user authentication page where you should authenticate with a user account that has permission to query your Log Analytics workspace.

Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:
```
%kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(client_id).clientsecret(client_secret)
```
instead of
```
%kql loganalytics://code().workspace(WORKSPACE_ID)
```

Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.<br>
On successful authentication you should see a ```popup schema``` button.
To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.
 </details>

In [None]:
#See if we have an Azure Sentinel Workspace defined in our config file, if not let the user specify Workspace and Tenant IDs
from msticpy.nbtools.wsconfig import WorkspaceConfig
ws_config = WorkspaceConfig()
try:
    ws_id = ws_config['workspace_id']
    ten_id = ws_config['tenant_id']
    display(HTML("Workspace details collected from config file"))
    config = True
except:
    display(HTML('Please go to your Log Analytics workspace, copy the workspace ID'
                 ' and/or tenant Id and paste here to enable connection to the workspace and querying of it..<br> '))
    ws_id = mnbwidgets.GetEnvironmentKey(env_var='WORKSPACE_ID',
                                        prompt='Please enter your Log Analytics Workspace Id:', auto_display=True)
    ten_id = nbwidgets.GetEnvironmentKey(env_var='TENANT_ID',
                                         prompt='Please enter your Log Analytics Tenant Id:', auto_display=True)
    config = False


In [None]:
# Establish a query provider for Azure Sentinel and connect to it
if config is False:
    ws_id = ws_id.value
    ten_id = ten_id.value
qry_prov = QueryProvider('LogAnalytics')
qry_prov.connect(connection_str=ws_config.code_connect_str)


## Set Hunting Time Frame
To begin the hunt we need to et the time frame in which you wish to test your compromised host hunting hypothesis within. Use the widget below to select your start and end time for the hunt. 

In [None]:
query_times = nbwidgets.QueryTime(units='day',
                                  max_before=20, max_after=1, before=3)
query_times.display()

### Select Host to Investigate
Select the host you want to test your hunting hypothesis against, only hosts with Syslog data within the time frame you specified are available. If the host you wish to select is not present try adjusting your time frame.

In [None]:
host_text = widgets.Text(
    description="Enter the Host name to search for:", **WIDGET_DEFAULTS
)
display(host_text)

In [None]:
hostname = None
items = []
hosts_query = f""" Syslog | where TimeGenerated between (datetime({query_times.start}) .. datetime({query_times.end})) 
                | where Computer contains "{host_text.value}" | distinct Computer | limit 490000"""
print("Collecting details on avaliable hosts...")
hosts_df = qry_prov._query_provider.query(query=hosts_query)
if isinstance(hosts_df, pd.DataFrame) and not hosts_df.empty:
    items = hosts_df["Computer"].unique().tolist()

if len(items) > 1:
    print(f"Multiple matches for '{host_text.value}'. Please select a host from the list.")
    choose_host = nbwidgets.SelectString(
        item_list=items,
        description="Select the host.",
        auto_display=True,
    )
    
elif not hosts_df.empty:
    hostname = items[0]
    md(f"Unique host found: {hostname}")
else:
    md(f"Host not found: {host_text.value}")

## Host Summary
Below is a overview of the selected host based on available data sources.

In [None]:
print("Collecting host details. This may take a few minutes...")
if not hostname:
    hostname = choose_host.value
# Collect data on the host
syslog_query = f""" Syslog | where TimeGenerated between (datetime({query_times.start}) .. datetime({query_times.end})) 
                | where Computer contains "{hostname}" """
all_syslog = qry_prov.exec_query(query=syslog_query)
syslog_data = all_syslog[all_syslog['Computer'] == f'{hostname}']
heartbeat_query = f"""Heartbeat | where TimeGenerated >= datetime({query_times.start}) | where TimeGenerated <= datetime({query_times.end})| where Computer == '{hostname}' | top 1 by TimeGenerated desc nulls last"""
if "AzureNetworkAnalytics_CL" in qry_prov.schema:
    aznet_query = f"""AzureNetworkAnalytics_CL | where TimeGenerated >= datetime({query_times.start}) | where TimeGenerated <= datetime({query_times.end}) | where VirtualMachine_s has '{hostname}' | where ResourceType == 'NetworkInterface' | top 1 by TimeGenerated desc | project PrivateIPAddresses = PrivateIPAddresses_s, PublicIPAddresses = PublicIPAddresses_s"""
    az_net_df = qry_prov.exec_query(query=aznet_query)
host_hb = qry_prov.exec_query(query=heartbeat_query)

# Create host entity record, with Azure network data if any is avaliable
if isinstance(az_net_df, pd.DataFrame):
    host_entity = create_host_record(
        syslog_df=syslog_data, heartbeat_df=host_hb, az_net_df=az_net_df)
else:
    host_entity = create_host_record(
        syslog_df=syslog_data, heartbeat_df=host_hb)

display(
    Markdown(
        "***Host Details***\n\n"
        f"**Hostname**: {host_entity.computer} \n\n"
        f"**OS**: {host_entity.OSType} {host_entity.OSName}\n\n"
        f"**IP Address**: {host_entity.IPAddress.Address}\n\n"
        f"**Location**: {host_entity.IPAddress.Location.CountryName}\n\n"
        f"**Installed Applications**: {host_entity.Applications}\n\n"
    )
)
rel_alert_select = None
sudo_events = None

### Host Alerts
This section provides an overview of any security alerts in Azure Sentinel related to this host, this will help scope and guide our hunt.

In [None]:
related_alerts = qry_prov.SecurityAlert.list_related_alerts(
    query_times, host_name=hostname)

if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:
    host_alert_items = (related_alerts[['AlertName', 'TimeGenerated']]
                        .groupby('AlertName').TimeGenerated.agg('count').to_dict())

    def print_related_alerts(alertDict, entityType, entityName):
        if len(alertDict) > 0:
            md(f"### Found {len(alertDict)} different alert types related to this {entityType} (\'{entityName}\')")
            for (k, v) in alertDict.items():
                md(f"- {k}, Count of alerts: {v}")
        else:
            md(f"No alerts for {entityType} entity \'{entityName}\'")


# Display alerts on timeline to aid in visual grouping
    print_related_alerts(host_alert_items, 'host', host_entity.HostName)
    x = nbdisplay.display_timeline(
        data=related_alerts, source_columns=["AlertName"], title="Host alerts over time", height=300, color="red")
else:
    md('No related alerts found.')

In [None]:
rel_alert_select = None

def show_full_alert(selected_alert):
    global security_alert, alert_ip_entities
    security_alert = SecurityAlert(
        rel_alert_select.selected_alert)
    nbdisplay.display_alert(security_alert, show_entities=True)

# Show selected alert when selected
if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:
    related_alerts['CompromisedEntity'] = related_alerts['Computer']
    md('### Click on alert to view details.')
    rel_alert_select = nbwidgets.AlertSelector(alerts=related_alerts,
                                               action=show_full_alert)
    rel_alert_select.display()
else:
    md('No related alerts found.')

## Re-scope Hunting Time Frame
Based on the security alerts for this host we can choose to re-scope our hunting time frame.

In [None]:
if rel_alert_select is None or rel_alert_select.selected_alert is None:
    start = query_times.start
else:
    start = rel_alert_select.selected_alert['TimeGenerated']

# Set new investigation time windows based on the selected alert
invest_times = nbwidgets.QueryTime(units='hours',
                                       max_before=24, max_after=12, before=6, origin_time=start)
invest_times.display()

## How to use this Notebook
Whilst this notebook is linear in layout it doesn't need to be linear in usage. We have selected our host to investigate and set an initial hunting time-frame to work within. We can now start to test more specific hunting hypothesis with the aim of validating our broader initial hunting hypothesis. To do this we can start by looking at:
- <a>Host Logon Events</a>
- <a>User Activity</a>
- <a>Application Activity</a>
- <a>Network Activity</a>

You can choose to start below with a hunt in host logon events or choose to jump to one of the other sections listed above. The order in which you choose to run each of these major sections doesn't matter, they are each self contained. You may also choose to rerun sections based on your findings from running other sections.

## Host Logon Events
**Hypothesis:** That an attacker has gained legitimate access to the host via compromised credentials and has logged into the host to conduct malicious activity. 

This section provides an overview of logon activity for the host within our hunting time frame, the purpose of this is to allow for the identification of anomalous logons or attempted logons.

In [None]:
# Collect logon events for this, seperate them into sucessful and unsucessful and cluster sucessful one into sessions
logon_events = qry_prov.LinuxSyslog.user_logon(invest_times, host_name=hostname)
remote_logons = None
failed_logons = None
logon_sessions_df = None
if isinstance(logon_events, pd.DataFrame) and not logon_events.empty:
    try:
        remote_logons = (logon_events[logon_events['LogonResult'] == 'Success'])
        failed_logons = (logon_events[logon_events['LogonResult'] == 'Failure'])
        logon_sessions_df = cluster_syslog_logons_df(logon_events)
    except:
        print("No logon sessions in this timeframe")
else:
    print("No logon events in this timeframe")


if (remote_logons is not None and not remote_logons.empty) or (failed_logons is not None and not failed_logons.empty):
    # Provide a timeline of sucessful and failed logon attempts to aid identification of potential brute force attacks
    display(Markdown('### Timeline of sucessful host logons.'))
    tl_data = {"Remote Logons": {"data": remote_logons, "source_columns": ['User', 'ProcessName', 'SourceIP'], "color": "Green"},
               "Failed Logons": {"data": failed_logons, "source_columns": ['User', 'ProcessName', 'SourceIP'], "time_column": "TimeGenerated", "color": "Red"}}
    logon_timeline = nbdisplay.display_timeline(
        data=tl_data, height=300, alert=rel_alert_select.selected_alert)
    palette = viridis(5)
    # Graph out failed/sucessful logons by account and by logon process
    all_df = pd.DataFrame(dict(successful=remote_logons['ProcessName'].value_counts(
    ), failed=failed_logons['ProcessName'].value_counts())).fillna(0)
    fail_data = pd.value_counts(failed_logons['User'].values, sort=True).head(
        10).reset_index(name='value').rename(columns={'User': 'Count'})
    fail_pie = None
    sucess_pie = None
    if not fail_data.empty:
        fail_pie = fail_data.plot_bokeh.pie(x='index', y="value", colormap=palette,
                                            show_figure=False, title="Relative Frequencies of Failed Logons by Account")
    sucess_data = pd.value_counts(remote_logons['User'].values, sort=False).reset_index(
        name='value').rename(columns={'User': 'Count'})
    if not sucess_data.empty:
        sucess_pie = sucess_data.plot_bokeh.pie(x='index', colormap=palette, y="value",
                                                show_figure=False, title="Relative Frequencies of Sucessful Logons by Account")
    processes = all_df.index.values.tolist()
    fail_sucess_data = pd.DataFrame({'processes': processes,
                                     'sucess': all_df['successful'].values.tolist(),
                                     'failure': all_df['failed'].values.tolist()})

    process_bar = fail_sucess_data.plot_bokeh.bar(
        x="processes", colormap=palette,  show_figure=False, title="Failed and Sucessful logon attempts by process")
    pandas_bokeh.plot_grid(
        [[fail_pie, sucess_pie], [process_bar]], plot_width=450, plot_height=300)

    # Convert logon IPs to IP entities in order to get location
    ip_entity = entityschema.IpAddress()
    #Is there a better way to do this rather than reseting the list each time.
    ip_list = []
    for ip_logon in remote_logons['SourceIP']:
        ip_list.extend(convert_to_ip_entities(ip_logon))
    ip_fail_list = []
    for ip_fail in failed_logons['SourceIP']:
        ip_fail_list.extend(convert_to_ip_entities(ip_fail))

    # Get center location of all IP locaitons to set map default
    location = get_map_center(ip_list + ip_fail_list)
    folium_map = FoliumMap(location=location, zoom_start=4)

    # Map logon locations to allow for identification of anomolous locations
    if len(ip_fail_list) > 0:
        display(HTML('<h3>Map of Originating Location of Logon Attempts</h3>'))
        icon_props = {'color': 'red'}
        folium_map.add_ip_cluster(ip_entities=ip_fail_list, **icon_props)
    if len(ip_list) > 0:
        icon_props = {'color': 'green'}
        folium_map.add_ip_cluster(ip_entities=ip_list, **icon_props)
    display(folium_map.folium_map)
    display(Markdown('<p style="color:red">Warning: the folium mapping library '
                         'does not display correctly in some browsers.</p><br>'
                         'If you see a blank image please retry with a different browser.'))

### Logon Sessions
Based on the detail above if you wish to focus your hunt on a particular user jump to the [User Activity](#user) section. Alternatively to further further refine our hunt we need to select a logon session to view in more detail. Select a session from the list below to continue. Sessions that occurred at the time an alert was raised for this host, or where the user has a abnormal ratio of failed to successful login attempts are highlighted.

In [None]:
import datetime as dt
def to_utc(time):
    ts = (time - np.datetime64('1970-01-01T00:00:00')) / np.timedelta64(1, 's')
    time = dt.datetime.utcfromtimestamp(ts) 
    return time
if logon_sessions_df is not None:
    logon_sessions_df["Alerts during session?"] = np.nan
    # check if any alerts occur during logon window.
    logon_sessions_df['Start (UTC)'] = [(to_utc(time) - dt.timedelta(seconds=5)) for time in logon_sessions_df['Start']]
    logon_sessions_df['End (UTC)'] = [(to_utc(time) + dt.timedelta(seconds=5)) for time in logon_sessions_df['End']]

    for TimeGenerated in related_alerts['TimeGenerated']:
        logon_sessions_df.loc[(TimeGenerated >= logon_sessions_df['Start (UTC)']) & (TimeGenerated <= logon_sessions_df['End (UTC)']), "Alerts during session?"] = "Yes"

    logon_sessions_df.loc[logon_sessions_df['User'] == 'root', "Root?"] = "Yes"
    logon_sessions_df.replace(np.nan, "No", inplace=True)

    ratios = []
    for _, row in logon_sessions_df.iterrows():
        suc_fail = logon_events.apply(lambda x: True if x['User'] == row['User'] and x["LogonResult"] == 'Success' else(
            False if x['User'] == row['User'] and x["LogonResult"] == 'Failure' else None), axis=1)
        numofsucess = len(suc_fail[suc_fail == True].index)
        numoffail = len(suc_fail[suc_fail == False].index)
        if numoffail == 0:
            ratio = 1
        else:
            ratio = numofsucess/numoffail
        ratios.append(ratio)
    logon_sessions_df["Sucessful to failed logon ratio"] = ratios

    def color_cells(val):
        if isinstance(val, str):
            color = 'yellow' if val == "Yes" else 'white'
        elif isinstance(val, float):
            color = 'yellow' if val > 0.5 else 'white'
        else:
            color = 'white'
        return 'background-color: %s' % color 

    display(logon_sessions_df[['User','Start (UTC)', 'End (UTC)', 'Alerts during session?', 'Sucessful to failed logon ratio', 'Root?']]
                        .style.applymap(color_cells).hide_index())

    logon_items = logon_sessions_df[['User','Start (UTC)', 'End (UTC)']].to_string(header=False,
                      index=False,
                      index_names=False).split('\n')
    logon_sessions_df["Key"] = logon_items
    logon_sessions_df.set_index('Key', inplace=True)
    logon_dict = logon_sessions_df[['User','Start (UTC)', 'End (UTC)']].to_dict('index')

    logon_selection = nbwidgets.SelectString(description='Select logon session to investigate: ',
                                                 item_dict=logon_dict , width='80%', auto_display=True)
else:
    md("No logon sessions during this timeframe")

#### Session Details

In [None]:
def view_syslog(selected_facility):
    display(syslog_events.query('Facility == @selected_facility'))

# Produce a summary of user modification actions taken
def action_count(x):
    if "Add" in x:
        return len(add_events.replace("", np.nan).dropna(subset=['User'])['User'].unique().tolist())
    elif "Modify" in x:
        return len(mod_events.replace("", np.nan).dropna(subset=['User'])['User'].unique().tolist())
    elif "Delete" in x:
        return len(del_events.replace("", np.nan).dropna(subset=['User'])['User'].unique().tolist())
    else:
        return ""
sudo_sessions = None
tooltip_cols = ['SyslogMessage']
if logon_sessions_df is not None:
    #Collect data based on the session selected for investigation
    invest_sess = {'StartTimeUtc': logon_selection.value.get('Start (UTC)'), 'EndTimeUtc': logon_selection.value.get(
        'End (UTC)'), 'Account': logon_selection.value.get('User'), 'Host': hostname}
    session = entityschema.HostLogonSession(invest_sess)
    syslog_events = qry_prov.LinuxSyslog.all_syslog(
        start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)
    sudo_events = qry_prov.LinuxSyslog.sudo_activity(
        start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host, user=session.Account)

    if isinstance(sudo_events, pd.DataFrame) and not sudo_events.empty:
        sudo_events[['Command', 'CommandCall']].replace('', np.nan, inplace=True)
        try:
            sudo_sessions = cluster_syslog_logons_df(logon_events=(sudo_events))
        except:
            pass

    # Display summary of cron activity in session
    cron_events = qry_prov.LinuxSyslog.cron_activity(
        start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)
    if not isinstance(cron_events, pd.DataFrame):
        display(HTML(
            f'<h3> No Cron activity for {session.Host} between {session.StartTimeUtc} and {session.EndTimeUtc}</h3>'))
        crn_tl_data = {}
    else:

        cron_events['CMD'].replace('', np.nan, inplace=True)

        crn_tl_data = {"Cron Exections": {"data": cron_events[['TimeGenerated', 'CMD', 'CronUser', 'SyslogMessage']].dropna(), "source_columns": tooltip_cols, "color": "Blue"},
                       "Cron Edits": {"data": cron_events.loc[cron_events['SyslogMessage'].str.contains('EDIT')], "source_columns": tooltip_cols, "color": "Green"}}

        display(HTML('<h2> Most common commands run by cron:</h2>'))
        display(HTML(
            'This shows how often each cron job was exected within the specified time window'))
        cron_commands = (cron_events[['EventTime', 'CMD']]
                         .groupby(['CMD']).count()
                         .dropna()
                         .style
                         .set_table_attributes('width=900px, text-align=center')
                         .background_gradient(cmap='Reds', low=0.5, high=1)
                         .format("{0:0>1.0f}"))
        display(cron_commands)

    # Display summary of user and group creations, deletions and modifications during the session
    user_activity = qry_prov.LinuxSyslog.user_group_activity(
        start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)

    if not isinstance(user_activity, pd.DataFrame) and not use_activity.empty:
        display(HTML(
            f' No user or group moidifcations for {session.Host} between {session.StartTimeUtc} and {session.EndTimeUtc}'))
    else:
        add_events = user_activity[user_activity['UserGroupAction'].str.contains(
            'Add')]
        del_events = user_activity[user_activity['UserGroupAction'].str.contains(
            'Delete')]
        mod_events = user_activity[user_activity['UserGroupAction'].str.contains(
            'Modify')]
        user_activity['Count'] = user_activity.groupby('UserGroupAction')['UserGroupAction'].transform('count')
        if add_events.empty and del_events.empty and mod_events.empty:
            display(HTML('<h2> Users and groups added or deleted:</h2<>'))
            display(HTML(
                f'No users or groups were added or deleted on {host_entity.HostName} between {query_times.start} and {query_times.end}'))
            user_tl_data = {}
        else:
            display(HTML("<h2>Users added, modified or deleted</h2>"))
            display(user_activity[['UserGroupAction','Count']].drop_duplicates().style.hide_index())
            account_actions = pd.DataFrame({"User Additions": [add_events.replace("", np.nan).dropna(subset=['User'])['User'].unique().tolist()],
                                            "User Modifications": [mod_events.replace("", np.nan).dropna(subset=['User'])['User'].unique().tolist()],
                                            "User Deletions": [del_events.replace("", np.nan).dropna(subset=['User'])['User'].unique().tolist()]})
            display(account_actions.style.hide_index())
            user_tl_data = {"User adds": {"data": add_events, "source_columns": tooltip_cols, "color": "Orange"},
                            "User deletes": {"data": del_events, "source_columns": tooltip_cols, "color": "Red"},
                            "User modfications": {"data": mod_events, "source_columns": tooltip_cols, "color": "Grey"}}
        # Display sudo activity during session
        if sudo_sessions is None:
            md(f"No Sudo sessions for {session.Host} between {logon_selection.value.get('Start (UTC)')} and {logon_selection.value.get('End (UTC)')}")
            sudo_tl_data = {}
        else:
            sudo_start = sudo_events[sudo_events["SyslogMessage"].str.contains(
                "pam_unix.+session opened")].rename(columns={"Sudoer": "User"})
            sudo_tl_data = {"Host logons": {"data": remote_logons, "source_columns": tooltip_cols, "color": "Cyan"},
                            "Sudo sessions": {"data": sudo_start, "source_columns": tooltip_cols, "color": "Purple"}}
            try:
                risky_actions = cmd_line.risky_cmd_line(events=sudo_events, log_type="Syslog")
                suspicious_events = cmd_speed(
                    cmd_events=sudo_events, time=60, events=2, cmd_field="Command")
            except:
                risky_actions = None
                suspicious_events = None
            if risky_actions is None and suspicious_events is None:
                pass
            else:
                risky_sessions = risky_sudo_sessions(
                    risky_actions=risky_actions, sudo_sessions=sudo_sessions, suspicious_actions=suspicious_events)
                for key in risky_sessions:
                    if key in sudo_sessions:
                        sudo_sessions[f"{key} - {risky_sessions[key]}"] = sudo_sessions.pop(
                            key)

        if sudo_events.empty:
            md(f"No sucessful sudo activity for {hostname} between {logon_selection.value.get('Start (UTC)')} and {logon_selection.value.get('End (UTC)')}")
        else:
            sudo_events.replace("", np.nan, inplace=True)
            display(HTML('<h2> Frequency of sudo commands</h2>'))
            display(HTML('This shows how many times each command has been run with sudo. /bin/bash is usally associated with the use of "sudo -i"'))
            sudo_commands = (sudo_events[['EventTime', 'CommandCall']]
                             .groupby(['CommandCall'])
                             .count()
                             .dropna()
                             .style
                             .set_table_attributes('width=900px, text-align=center')
                             .background_gradient(cmap='Reds', low=.5, high=1)
                             .format("{0:0>3.0f}"))
            display(sudo_commands)

    # Display a timeline of all activity during session
    crn_tl_data.update(user_tl_data)
    crn_tl_data.update(sudo_tl_data)
    display(HTML('<h2> Session Timeline.</h2>'))
    nbdisplay.display_timeline(
        data=crn_tl_data, title='Session Timeline', height=300)
else:
    md("No logon sessions during this timeframe")

#### Raw data from user session
Use this syslog message data to further investigate suspicous activity during the session

In [None]:
if logon_sessions_df is not None:
    #Return syslog data and present it to the use for investigation
    session_syslog = qry_prov.LinuxSyslog.all_syslog(
        start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)
    if session_syslog.empty:
        display(HTML(
            f' No syslog for {session.Host} between {session.StartTimeUtc} and {session.EndTimeUtc}'))


    def view_sudo(selected_cmd):
        display(sudo_events.query('CommandCall == @selected_cmd')[
                ['TimeGenerated', 'SyslogMessage', 'Sudoer', 'SudoTo', 'Command', 'CommandCall']])

    # Show syslog messages associated with selected sudo command
    display(HTML("<h3>View all messages assocated with a sudo command</h3>"))
    items = sudo_events['CommandCall'].dropna().unique().tolist()
    cmd_w = widgets.Dropdown(
        options=items, description='Select sudo command facility to examine', disabled=False, **WIDGET_DEFAULTS)
    display(widgets.interactive(view_sudo, selected_cmd=cmd_w))
else:
    md("No logon sessions during this timeframe")

In [None]:
if logon_sessions_df is not None:
    # Display syslog messages from the session witht he facility selected
    items = syslog_events['Facility'].dropna().unique().tolist()
    display(HTML("<h3>View all messages assocated with a syslog facility</h3>"))
    sess_w = widgets.Dropdown(
        options=items, description='Select syslog facility to examine', disabled=False, **WIDGET_DEFAULTS)
    display(widgets.interactive(view_syslog, selected_facility=sess_w))
else:
    md("No logon sessions during this timeframe")

### Process Tree from session

In [None]:
if logon_sessions_df is not None:
    display(HTML("<h3>Process Trees from session</h3>"))
    print("Building process tree, this may take some time...")
    # Find the table with auditd data in
    regex = '.*audit.*\_cl?'
    matches = ((re.match(regex, key, re.IGNORECASE)) for key in qry_prov.schema)
    for match in matches:
        if match != None:
            audit_table = match.group(0)

    # Retrieve auditd data
    if audit_table:
        audit_data = qry_prov.LinuxAudit.auditd_all(
            start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=hostname
        )
        if isinstance(audit_data, pd.DataFrame) and not audit_data.empty:
            audit_events = auditdextract.extract_events_to_df(
                data=audit_data
            )

            process_tree = auditdextract.generate_process_tree(audit_data=audit_events)
            process_tree.mp_process_tree.plot()
        else:
            display(HTML("No auditd data avaliable to build process tree"))
    else:
        display(HTML("No auditd data avaliable to build process tree"))
else:
    md("No logon sessions during this timeframe")

Click [here](#app) to start a process/application focused hunt or continue with session based hunt below by selecting a sudo session to investigate.

### Sudo Session Investigation
Sudo activity is often required by an attacker to conduct actions on target, and more granular data is avalibale for sudo sessions allowing for deeper level hunting within these sesions.

In [None]:
if logon_sessions_df is not None and sudo_sessions is not None:
    sudo_items = sudo_sessions[['User','Start', 'End']].to_string(header=False,
                      index=False,
                      index_names=False).split('\n')
    sudo_sessions["Key"] = sudo_items
    sudo_sessions.set_index('Key', inplace=True)
    sudo_dict = sudo_sessions[['User','Start', 'End']].to_dict('index')

    sudo_selection = nbwidgets.SelectString(description='Select sudo session to investigate: ',
                                                item_dict=sudo_dict, width='100%', height='300px', auto_display=True)
else:
    sudo_selection = None
    md("No logon sessions during this timeframe")

Load TILookup class
> **Note**: to use TILookup you will need configuration settings in your msticpyconfig.yaml
> <br>see [TIProviders documenation](https://msticpy.readthedocs.io/en/latest/TIProviders.html)
> <br>and [Configuring Notebook Environment notebook](./ConfiguringNotebookEnvironment.ipynb)
> <br>or [ConfiguringNotebookEnvironment (GitHub static view)](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)

In [None]:
tilookup = TILookup()

In [None]:
#Collect data associated with the sudo session selected
from msticpy.sectools.tiproviders.ti_provider_base import TISeverity

def ti_check_sev(severity, threshold):
    severity = TISeverity.parse(severity)
    threshold = TISeverity.parse(threshold)
    return severity.value >= threshold.value

if sudo_selection:
    sudo_sess = {'StartTimeUtc': sudo_selection.value.get('Start'), 'EndTimeUtc': sudo_selection.value.get(
        'End'), 'Account': sudo_selection.value.get('User'), 'Host': hostname}
    sudo_session = entityschema.HostLogonSession(sudo_sess)
    sudo_events = qry_prov.LinuxSyslog.sudo_activity(start=sudo_session.StartTimeUtc.round(
        '-1s') - pd.Timedelta(seconds=1), end=(sudo_session.EndTimeUtc.round('1s')+ pd.Timedelta(seconds=1)), host_name=sudo_session.Host)
    if isinstance(sudo_events, pd.DataFrame) and not sudo_events.empty:
        display(sudo_events.replace('', np.nan).dropna(axis=0, subset=['Command'])[
                ['TimeGenerated', 'Command', 'CommandCall', 'SyslogMessage']])
        # Extract IOCs from the data
        ioc_extractor = iocextract.IoCExtract()
        os_family = host_entity.OSType if host_entity.OSType else 'Linux'
        print('Extracting IoCs.......')
        ioc_df = ioc_extractor.extract(data=sudo_events,
                                       columns=['SyslogMessage'],
                                       os_family=os_family,
                                       ioc_types=['ipv4', 'ipv6', 'dns', 'url',
                                                  'md5_hash', 'sha1_hash', 'sha256_hash'])
        if len(ioc_df) > 0:
            ioc_count = len(
                ioc_df[["IoCType", "Observable"]].drop_duplicates())
            md(f"Found {ioc_count} IOCs")
            #Lookup the extracted IOCs in TI feed
            ti_resps = tilookup.lookup_iocs(data=ioc_df[["IoCType", "Observable"]].drop_duplicates(
            ).reset_index(), obs_col='Observable', ioc_type_col='IoCType')
            i = 0
            ti_hits = []
            ti_resps.reset_index(drop=True, inplace=True)
            while i < len(ti_resps):
                if ti_resps['Result'][i] == True and ti_check_sev(ti_resps['Severity'][i], 1):
                    ti_hits.append(ti_resps['Ioc'][i])
                    i += 1
                else:
                    i += 1
            md(f"Found {len(ti_hits)} IoCs in Threat Intelligence")
            for ioc in ti_hits:
                md(f"Messages containing IoC found in TI feed: {ioc}")
                display(sudo_events[sudo_events['SyslogMessage'].str.contains(
                    ioc)][['TimeGenerated', 'SyslogMessage']])
        else:
           md("No IoC patterns found in Syslog Messages.")
    else:
        md('No sudo messages for this session')


else:
    md("No Sudo session to investigate")

Jump to:
- <a>Host Logon Events</a>
- <a>Application Activity</a>
- <a>Network Activity</a>

<a></a>
## User Activity
**Hypothesis:** That an attacker has gained  access to the host and is using a user account to conduct actions on the host.

This section provides an overview of activity by user within our hunting time frame, the purpose of this is to allow for the identification  of anomalous activity by a user. This hunt can be driven be investigation of suspected users or as a hunt across all users seen on the host.

In [None]:
# Get list of users with logon or sudo sessions on host
logon_events = qry_prov.LinuxSyslog.user_logon(query_times, host_name=hostname)
users = logon_events['User'].replace('', np.nan).dropna().unique().tolist()
all_users = list(users)


if isinstance(sudo_events, pd.DataFrame) and not sudo_events.empty:
    sudoers = sudo_events['Sudoer'].replace(
        '', np.nan).dropna().unique().tolist()
    all_users.extend(x for x in sudoers if x not in all_users)

# Pick Users
if not logon_events.empty:
    user_select = nbwidgets.SelectString(description='Select user to investigate: ',
                                             item_list=all_users, width='75%', auto_display=True)
else:
    md("There was no user activity in the timeframe specified.")
    user_select = None

In [None]:
folium_user_map = FoliumMap()

def view_sudo(cmd):
    display(user_sudo_hold.query('CommandCall == @cmd')[
            ['TimeGenerated', 'HostName', 'Command', 'CommandCall', 'SyslogMessage']])
user_sudo_hold = None
if user_select is not None:
    # Get all syslog relating to these users
    username = user_select.value
    user_events = all_syslog[all_syslog['SyslogMessage'].str.contains(username)]
    logon_sessions = cluster_syslog_logons_df(logon_events)

    # Display all logons associated with the user
    display(HTML(f"<h1> User Logon Activity for {username}</h1>"))
    user_logon_events = logon_events.loc[logon_events['User'] == username]
    try:
        user_logon_sessions = cluster_syslog_logons_df(user_logon_events)
    except:
        user_logon_sessions = None
    
    user_remote_logons = (
        user_logon_events[user_logon_events['LogonResult'] == 'Success']
    )
    user_failed_logons = (
        user_logon_events[user_logon_events['LogonResult'] == 'Failure']
    )
    if not user_remote_logons.empty:
        for _, row in logon_sessions_df.iterrows():
            end = row['End']
        user_sudo_events = qry_prov.LinuxSyslog.sudo_activity(start=user_remote_logons.sort_values(
            by='TimeGenerated')['TimeGenerated'].head(1).values[0], end=end, host_name=hostname, user=username)
    else: 
        user_sudo_events = None

    if user_logon_sessions is None and user_remote_logons.empty and user_failed_logons.empty:
        pass
    else:
        display(HTML(
            f"{len(user_remote_logons)} sucessfull logons and {len(user_failed_logons)} failed logons for {username}"))

        display(Markdown('### Timeline of host logon attempts.'))
        tooltip_cols = ['SyslogMessage']
        dfs = {"User Logons" :user_remote_logons, "Failed Logons": user_failed_logons, "Sudo Events" :user_sudo_events}
        user_tl_data = {}

        for k,v in dfs.items():
            if v is not None and not v.empty:
                user_tl_data.update({k :{"data":v,"source_columns":tooltip_cols}})

        nbdisplay.display_timeline(
            data=user_tl_data, title="User logon timeline", height=300)

        palette = viridis(2)
        # Graph out failed/sucessful logons by account and by logon process
        all_user_df = pd.DataFrame(dict(successful=user_remote_logons['ProcessName'].value_counts(
        ), failed=user_failed_logons['ProcessName'].value_counts())).fillna(0).T

        user_processes = all_user_df.columns.values.tolist()

        fail_sucess_user_data = pd.DataFrame({'processes': user_processes,
                                         'sucess': all_user_df.loc['successful'].values.tolist(),
                                         'failure': all_user_df.loc['failed'].astype(int).values.tolist()})

        user_process_bar = fail_sucess_user_data.plot_bokeh.bar(
            x="processes", colormap=palette,  show_figure=False, title="Failed and Sucessful logon attempts by process")
        user_logons = pd.DataFrame({"Sucessful Logons" : [int(all_user_df.loc['successful'].sum())],
                                "Failed Logons" : [int(all_user_df.loc['failed'].sum())]}).T

        user_ratio_pie =user_logons.plot_bokeh.pie(colormap = palette,
                                                show_figure = False, title = "Relative Frequencies of Sucessful Logons by Account")

        pandas_bokeh.plot_grid([[user_ratio_pie, user_process_bar], 
                               []], plot_width = 450, plot_height = 300)


         # Convert logon IPs to IP entities in order to get location
        ip_entity = entityschema.IpAddress()
        
        user_ip_list = []
        for ip_logon in user_remote_logons['SourceIP']:
            user_ip_list.extend(convert_to_ip_entities(ip_logon))
        user_ip_fail_list = []
        for ip_logon in user_failed_logons['SourceIP']:
            user_ip_fail_list.extend(convert_to_ip_entities(ip_logon))
            
        folium_user_map=FoliumMap(location=location, zoom_start=3)
        if not user_ip_list and not user_ip_fail_list:
            print("No user events")
        elif not user_ip_list and user_ip_fail_list:
            icon_props={'color': 'red'}
            folium_user_map.add_ip_cluster(ip_entities=user_ip_fail_list, **icon_props)
        elif not user_ip_fail_list and user_ip_list:
            icon_props = {'color': 'green'}
            folium_user_map.add_ip_cluster(ip_entities=user_ip_list, **icon_props)
        else:
            icon_props = {'color': 'red'}
            folium_user_map.add_ip_cluster(ip_entities=user_ip_fail_list, **icon_props)
            icon_props = {'color': 'green'}
            folium_user_map.add_ip_cluster(ip_entities=user_ip_list, **icon_props)

        folium_user_map.center_map()
        display(HTML('<h3>Map of Originating Location of Logon Attempts</h3>'))
        display(folium_user_map)
        display(Markdown('<p style="color:red">Warning: the folium mapping library '
                         'does not display correctly in some browsers.</p><br>'
                         'If you see a blank image please retry with a different browser.'))



    #Display sudo activity of the user 
    if not isinstance(user_sudo_events, pd.DataFrame) or user_sudo_events.empty:
        display(HTML(f"No sucessful sudo activity for {username}"))
    else:
        user_sudo_hold = user_sudo_events
        user_sudo_commands = (user_sudo_events[['EventTime', 'CommandCall']].replace('', np.nan).groupby(['CommandCall']).count().dropna().style.set_table_attributes('width=900px, text-align=center').background_gradient(cmap='Reds', low=.5, high=1).format("{0:0>3.0f}"))
        display(user_sudo_commands)
        display(HTML("Select a sudo command to investigate in more detail"))
        cmd = widgets.Dropdown(options=user_sudo_events['CommandCall'].replace(
                '', np.nan).dropna().unique().tolist(), description='Cmd:', disabled=False)
        display(widgets.interactive(view_sudo, cmd=cmd))
else:
    md("No user session selected")

In [None]:
# If the user has sudo activity extract and IOCs from the logs and look them up in TI feeds
if user_sudo_hold is not None or user_sudo_hold is not isinstance(user_sudo_hold, pd.DataFrame) or user_sudo_hold.empty:
    print(f"No sudo messages data")
else:
    # Extract IOCs
    ioc_extractor = iocextract.IoCExtract()
    os_family = host_entity.OSType if host_entity.OSType else 'Linux'
    print('Extracting IoCs.......')
    ioc_df = ioc_extractor.extract(data=user_sudo_hold,
                                   columns=['SyslogMessage'],
                                   os_family=os_family,
                                   ioc_types=['ipv4', 'ipv6', 'dns', 'url', 'md5_hash', 'sha1_hash', 'sha256_hash'])
    if len(ioc_df) > 0:
        ioc_count = len(ioc_df[["IoCType", "Observable"]].drop_duplicates())
        display(HTML(f"Found {ioc_count} IOCs"))
        ti_resps = tilookup.lookup_iocs(data=ioc_df[["IoCType", "Observable"]].drop_duplicates(
        ).reset_index(), obs_col='Observable', ioc_type_col='IoCType')
        i = 0
        ti_hits = []
        ti_resps.reset_index(drop=True, inplace=True)
        while i < len(ti_resps):
            if ti_resps['Result'][i] == True and ti_check_sev(ti_resps['Severity'][i], 1):
                ti_hits.append(ti_resps['Ioc'][i])
                i += 1
            else:
                i += 1
        display(HTML(f"Found {len(ti_hits)} IoCs in Threat Intelligence"))
        for ioc in ti_hits:
            display(HTML(f"Messages containing IoC found in TI feed: {ioc}"))
            display(user_sudo_hold[user_sudo_hold['SyslogMessage'].str.contains(
                ioc)][['TimeGenerated', 'SyslogMessage']])
    else:
        display(HTML("No IoC patterns found in Syslog Message."))

Jump to:
- <a>Host Logon Events</a>
- <a>User Activity</a>
- <a>Network Activity</a>

<a></a>
## Application Activity

**Hypothesis:** That an attacker has compromised an application running on the host and is using the applications process to conduct actions on the host.

This section provides an overview of activity by application within our hunting time frame, the purpose of this is to allow for the identification of anomalous activity by an application. This hunt can be driven be investigation of suspected applications or as a hunt across all users seen on the host.

In [None]:
# Get list of Applications
apps = all_syslog['ProcessName'].replace('', np.nan).dropna().unique().tolist()
system_apps = ['sudo', 'CRON', 'systemd-resolved', 'snapd',
               '50-motd-news', 'systemd-logind', 'dbus-deamon', 'crontab']
if len(host_entity.Applications) > 0:
    installed_apps = []
    installed_apps.extend(x for x in apps if x not in system_apps)

    # Pick Applications
    app_select = nbwidgets.SelectString(description='Select sudo session to investigate: ',
                                            item_list=installed_apps, width='75%', auto_display=True)
else:
    display(HTML("No applications other than stand OS applications present"))

In [None]:
from bokeh.models import ColumnDataSource, RangeTool
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import column
output_notebook()
# Get all syslog relating to these Applications
app = app_select.value
app_data = all_syslog.loc[all_syslog['ProcessName'] == app]

# App log volume over time
if isinstance(app_data, pd.DataFrame) and not app_data.empty:
    app_data_volume = app_data.set_index(
        "TimeGenerated").resample('5T').count()
    source = ColumnDataSource(
        data=dict(date=app_data_volume.index, count=app_data_volume['SyslogMessage']))
    p = figure(plot_height=300, plot_width=900, tools="xpan", toolbar_location=None,
               x_axis_type="datetime", x_axis_location="above", y_minor_ticks=2,
               title="Application syslog volume over time",
               background_fill_color="#efefef", x_range=(app_data_volume.index[int(len(app_data_volume.index)*.33)], app_data_volume.index[int(len(app_data_volume.index)*.66)]))
    p.line('date', 'count', source=source)
    p.yaxis.axis_label = 'Message volume'
    select = figure(title="Drag the middle and edges of the selection box to change the range above",
                    plot_height=130, plot_width=900, y_range=p.y_range,
                    x_axis_type="datetime", y_axis_type=None,
                    tools="", toolbar_location=None, background_fill_color="#efefef")
    range_tool = RangeTool(x_range=p.x_range)
    range_tool.overlay.fill_color = "navy"
    range_tool.overlay.fill_alpha = 0.2
    select.line('date', 'count', source=source)
    select.ygrid.grid_line_color = None
    select.add_tools(range_tool)
    select.toolbar.active_multi = range_tool
    show(column(p, select))
    app_high_sev = app_data[app_data['SeverityLevel'].isin(
        ['emerg', 'alert', 'crit', 'err', 'warning'])]
    if app_high_sev.empty:
        print(f"No high severity syslog messages for {app}")
    else:
        app_high_sev = app_high_sev.set_index(
            "TimeGenerated").resample('5T').count()
        hs_source = ColumnDataSource(
            data=dict(date=app_high_sev.index, count=app_high_sev['SyslogMessage']))
        hs_p = figure(plot_height=300, plot_width=900, tools="xpan", toolbar_location=None,
                      x_axis_type="datetime", x_axis_location="above", y_minor_ticks=2,
                      title="High Severity application syslog volume over time",
                      background_fill_color="#FCF1CB", x_range=(app_high_sev.index[int(len(app_high_sev.index)*.33)], app_high_sev.index[int(len(app_high_sev.index)*.66)]), y_range=(0, app_data_volume['SyslogMessage'].max()))
        hs_p.line('date', 'count', source=hs_source, line_color='red')
        hs_p.yaxis.axis_label = 'Message volume'
        hs_select = figure(title="Drag the middle and edges of the selection box to change the range above",
                           plot_height=130, plot_width=900, y_range=hs_p.y_range,
                           x_axis_type="datetime", y_axis_type=None,
                           tools="", toolbar_location=None, background_fill_color="#FCF1CB")
        hs_range_tool = RangeTool(x_range=hs_p.x_range)
        hs_range_tool.overlay.fill_color = "orange"
        hs_range_tool.overlay.fill_alpha = 0.2
        hs_select.line('date', 'count', source=hs_source, line_color='red')
        hs_select.ygrid.grid_line_color = None
        hs_select.add_tools(hs_range_tool)
        hs_select.toolbar.active_multi = hs_range_tool
        show(column(hs_p, hs_select))
else:
    display(HTML("No data for this application"))
# Check for mallicious stuff
risky_messages = risky_cmd_line(events=app_data, log_type="Syslog", cmd_field="SyslogMessage")
if not risky_messages:
    pass
else:
    print(risky_messages)

### Display process tree
Due to the large volume of data involved you may wish to make you query window smaller

In [None]:
if rel_alert_select is None or rel_alert_select.selected_alert is None:
    start = query_times.start
else:
    start = rel_alert_select.selected_alert['TimeGenerated']

# Set new investigation time windows based on the selected alert
proc_invest_times = nbwidgets.QueryTime(units='hours',
                                       max_before=6, max_after=3, before=2, origin_time=start)
proc_invest_times.display()

In [None]:
audit_table = None
app_audit_data = None
app = app_select.value
regex = '.*audit.*\_cl?'
# Find the table with auditd data in and collect the data
matches = ((re.match(regex, key, re.IGNORECASE)) for key in qry_prov.schema)
for match in matches:
    if match != None:
        audit_table = match.group(0)

#Check if the amount of data expected to be returned is a reasonable size, if not prompt before continuing
if audit_table != None:
    if isinstance(app_audit_data, pd.DataFrame):
        pass
    else:
        print('Collecting audit data, please wait this may take some time....')
        app_audit_query_count = f"""{audit_table} 
                    | where TimeGenerated >= datetime({proc_invest_times.start}) 
                    | where TimeGenerated <= datetime({proc_invest_times.end}) 
                    | where Computer == '{hostname}'
                    | summarize count()
                   """
        
        count_check = qry_prov.exec_query(query=app_audit_query_count)

        if count_check['count_'].iloc[0] > 100000 and not count_check.empty:
            size = count_check['count_'].iloc[0]
            print(
                f"You are returning a very large dataset ({size} rows).",
                "It is reccomended that you consider scoping the size\n",
                "of your query down.\n",
                "Are you sure you want to proceed?"
            )
            response = (input("Y/N") or "N")
        
#         app_audit_query = f"""{audit_table} 
#                     | where TimeGenerated >= datetime({proc_invest_times.start}) 
#                     | where TimeGenerated <= datetime({proc_invest_times.end}) 
#                     | where Computer == '{hostname}'
#                     | where RawData contains "sshd"
#                     """
        if (
            (count_check['count_'].iloc[0] < 100000)
            or (count_check['count_'].iloc[0] > 100000
                and response.casefold().startswith("y"))
        ):
            print("querying audit data...")
            audit_data = qry_prov.LinuxAudit.auditd_all(
                start=proc_invest_times.start, end=proc_invest_times.end, host_name=hostname
                )
            if isinstance(audit_data, pd.DataFrame) and not audit_data.empty:
                print("building process tree...")
                audit_events = auditdextract.extract_events_to_df(
                    data=audit_data
                )
                
                process_tree = auditdextract.generate_process_tree(audit_data=audit_events)
                plot_lim = 1000
                if len(process_tree) > plot_lim:
                    md_warn(f"More than {plot_lim} processes to plot, limiting to top {plot_lim}.")
                    process_tree[:plot_lim].mp_process_tree.plot(legend_col="exe")
                else:
                    process_tree.mp_process_tree.plot(legend_col="exe")
                size = audit_events.size
                print(f"Collected {size} rows of data")
        else:
            print("Resize query window")
    
else:
    display(HTML("No audit events avalaible"))

In [None]:
display(HTML(f"<h3>Process tree for {app}</h3>"))
#Generate process tree with auditd data around the selected process
# from msticpy.sectools import auditdextract
# if isinstance(audit_events, pd.DataFrame) and not audit_events.empty:
#     audit_events = auditdextract.extract_events_to_df(
#             data=app_audit_data, input_column='RawData')
#     if not audit_events[audit_events["exe"].str.contains(app, na=False)].empty:
#         procs = auditdextract.cluster_auditd_processes(audit_data=audit_events, app=app)
#         display(Markdown(f'{len(procs)} process events'))
#         process_tree = auditdextract.generate_process_tree(audit_data = audit_events, processes = procs)
#         nbdisplay.display_process_tree(process_tree)

#     else:
#         display(f"No process tree data avaliable for {app}")
#         process_tree = None
if not process_tree[process_tree["exe"].str.contains(app, na=False)].empty:    
    app_roots = process_tree[process_tree["exe"].str.contains(app)].apply(lambda x: ptree.get_root(process_tree, x), axis=1)
    trees = []
    for root in app_roots["source_index"].unique():
        trees.append(process_tree[process_tree["path"].str.startswith(root)])
    app_proc_trees = pd.concat(trees)
    app_proc_trees.mp_process_tree.plot(legend_col="exe", show_table=True)
else:
    display(f"No process tree data avaliable for {app}")
    process_tree = None

### Application Logs with associated Threat Intelligence
These logs are associated with the process being investigated and include IOCs that appear in our TI feeds.

In [None]:
# Extract IOCs from syslog assocated with the selected process
ioc_extractor = iocextract.IoCExtract()
os_family = host_entity.OSType if host_entity.OSType else 'Linux'
md('Extracting IoCs...')
ioc_df = ioc_extractor.extract(data=app_data,
                               columns=['SyslogMessage'],
                               os_family=os_family,
                               ioc_types=['ipv4', 'ipv6', 'dns', 'url',
                                          'md5_hash', 'sha1_hash', 'sha256_hash'])

if process_tree is not None and not process_tree.empty:
    app_process_tree = app_proc_trees.dropna(subset=['cmdline'])
    audit_ioc_df = ioc_extractor.extract(data=app_process_tree,
                                         columns=['cmdline'],
                                         os_family=os_family,
                                         ioc_types=['ipv4', 'ipv6', 'dns', 'url',
                                                    'md5_hash', 'sha1_hash', 'sha256_hash'])

    ioc_df = ioc_df.append(audit_ioc_df)
# Look up IOCs in TI feeds
if len(ioc_df) > 0:
    ioc_count = len(ioc_df[["IoCType", "Observable"]].drop_duplicates())
    md(f"Found {ioc_count} IOCs")
    md("Looking up threat intel...")
    ti_resps = tilookup.lookup_iocs(data=ioc_df[[
                                     "IoCType", "Observable"]].drop_duplicates().reset_index(drop=True), obs_col='Observable')
    i = 0
    ti_hits = []
    ti_resps.reset_index(drop=True, inplace=True)
    while i < len(ti_resps):
        if ti_resps['Result'][i] == True and ti_check_sev(ti_resps['Severity'][i], 1):
            ti_hits.append(ti_resps['Ioc'][i])
            i += 1
        else:
            i += 1
    display(HTML(f"Found {len(ti_hits)} IoCs in Threat Intelligence"))
    for ioc in ti_hits:
        display(HTML(f"Messages containing IoC found in TI feed: {ioc}"))
        display(app_data[app_data['SyslogMessage'].str.contains(
            ioc)][['TimeGenerated', 'SyslogMessage']])
else:
    display(Markdown("### No IoC patterns found in Syslog Message."))

Jump to:
- <a>Host Logon Events</a>
- <a>User Activity</a>
- <a>Application Activity</a>

## Network Activity
**Hypothesis:** That an attacker is remotely communicating with the host in order to compromise the host or for C2 or data exfiltration purposes after compromising the host.

This section provides an overview of network activity to and from the host during hunting time frame, the purpose of this is to allow for the identification of anomalous network traffic. If you wish to investigate a specific IP in detail it is recommended that you use the IP Explorer Notebook (include link).

In [None]:
# Get list of IPs from Syslog and Azure Network Data
ioc_extractor = iocextract.IoCExtract()
os_family = host_entity.OSType if host_entity.OSType else 'Linux'
print('Finding IP Addresses this may take a few minutes.......')
syslog_ips = ioc_extractor.extract(data=syslog_data,
                                   columns=['SyslogMessage'],
                                   os_family=os_family,
                                   ioc_types=['ipv4', 'ipv6'])


if 'AzureNetworkAnalytics_CL' not in qry_prov.schema:
    az_net_comms_df = None
    az_ips = None
else:
    if hasattr(host_entity, 'private_ips') and hasattr(host_entity, 'public_ips'):
        all_host_ips = host_entity.private_ips + \
            host_entity.public_ips + [host_entity.IPAddress]
    else:
        all_host_ips = [host_entity.IPAddress]
    host_ips = {'\'{}\''.format(i.Address) for i in all_host_ips}
    host_ip_list = ','.join(host_ips)

    az_ip_where = f"""| where (VMIPAddress in ("{host_ip_list}") or SrcIP in ("{host_ip_list}") or DestIP in ("{host_ip_list}")) and (AllowedOutFlows > 0 or AllowedInFlows > 0)"""
    az_net_comms_df = qry_prov.AzureNetwork.az_net_analytics(
        start=query_times.start, end=query_times.end, host_name=hostname, where_clause=az_ip_where)
    if isinstance(az_net_comms_df, pd.DataFrame) and not az_net_comms_df.empty:
        az_ips = az_net_comms_df.query("PublicIPs != @host_entity.IPAddress")
    else:
        az_ips = None
if len(syslog_ips):
    IPs = syslog_ips[['IoCType', 'Observable']].drop_duplicates('Observable')
    display(f"Found {len(IPs)} IP Addresses assoicated with the host")
else:
    display(Markdown("### No IoC patterns found in Syslog Message."))
    
if az_ips is not None:
    ips = az_ips['PublicIps'].drop_duplicates(
    ) + syslog_ips['Observable'].drop_duplicates()
else:
    ips = syslog_ips['Observable'].drop_duplicates()

if isinstance(az_net_comms_df, pd.DataFrame) and not az_net_comms_df.empty:
    import warnings

    with warnings.catch_warnings():
        warnings.simplefilter("ignore")

        az_net_comms_df['TotalAllowedFlows'] = az_net_comms_df['AllowedOutFlows'] + \
            az_net_comms_df['AllowedInFlows']
        sns.catplot(x="L7Protocol", y="TotalAllowedFlows",
                    col="FlowDirection", data=az_net_comms_df)
        sns.relplot(x="FlowStartTime", y="TotalAllowedFlows",
                    col="FlowDirection", kind="line",
                    hue="L7Protocol", data=az_net_comms_df).set_xticklabels(rotation=50)

    nbdisplay.display_timeline(data=az_net_comms_df.query('AllowedOutFlows > 0'),
                               overlay_data=az_net_comms_df.query(
                                   'AllowedInFlows > 0'),
                               title='Network Flows (out=blue, in=green)',
                               time_column='FlowStartTime',
                               source_columns=[
                                   'FlowType', 'AllExtIPs', 'L7Protocol', 'FlowDirection'],
                               height=300)
else:
    print('No Azure network data for specified time range.')

### Choose ASNs/IPs to Check for Threat Intel Reports
Choose from the list of Selected ASNs for the IPs you wish to check on. Then select the IP(s) that you wish to check against Threat Intelligence data.
The Source list is populated with all ASNs found in the syslog and network flow data.

In [None]:

from functools import lru_cache
from ipwhois import IPWhois
from ipaddress import ip_address

#Lookup each IP in whois data and extract the ASN
@lru_cache(maxsize=1024)
def whois_desc(ip_lookup, progress=False):
    try:
        ip = ip_address(ip_lookup)
    except ValueError:
        return "Not an IP Address"
    if ip.is_private:
        return "private address"
    if not ip.is_global:
        return "other address"
    whois = IPWhois(ip)
    whois_result = whois.lookup_whois()
    if progress:
        print(".", end="")
    return whois_result["asn_description"]

# Summarise network data by ASN
ASN_List = []
print("WhoIs Lookups")
ASNs = ips.apply(lambda x: whois_desc(x, True))
IP_ASN = pd.DataFrame(dict(IPs=ips, ASN=ASNs)).reset_index()
x = IP_ASN.groupby(["ASN"]).count().drop(
    'index', axis=1).sort_values('IPs', ascending=False)
display(x)
ASN_List = x.index

# Select an ASN to investigate in more detail
selection = widgets.SelectMultiple(
    options=ASN_List,
    width=900,
    description='Select ASN to investigate',
    disabled=False
)
selection

In [None]:
# For every IP associated with the selected ASN look them up in TI feeds
ip_invest_list = None
for ASN in selection.value:
    if ip_invest_list is None:
        ip_invest_list = (IP_ASN[IP_ASN["ASN"] == ASN]['IPs'].tolist())
    else:
        ip_invest_list + (IP_ASN[IP_ASN["ASN"] == ASN]['IPs'].tolist())

if ip_invest_list is not None:
    ioc_ip_list = []
    if len(ip_invest_list) > 0:
        ti_resps = tilookup.lookup_iocs(data=ip_invest_list, providers=["OTX"])
        i = 0
        ti_hits = []
        while i < len(ti_resps):
            if ti_resps['Details'][i]['pulse_count'] > 0:
                ti_hits.append(ti_resps['IoC'][i])
                i += 1
            else:
                i += 1
        display(HTML(f"Found {len(ti_hits)} IoCs in Threat Intelligence"))
        for ioc in ti_hits:
            ioc_ip_list.append(ioc)

    #Show IPs found in TI feeds for further investigation        
    if len(ioc_ip_list) > 0: 
        display(HTML("Select an IP whcih appeared in TI to investigate further"))
        ip_selection = nbwidgets.SelectString(description='Select IP Address to investigate: ', item_list = ioc_ip_list, width='95%', auto_display=True)
    else: 
        ip_selection = None
else:
    md("No IPs to investigate")

In [None]:
# Get all syslog for the IPs
if ip_selection is not None:
    display(HTML("Syslog data associated with this IP Address"))
    sys_hits = all_syslog[all_syslog['SyslogMessage'].str.contains(
        ip_selection.value)]
    display(sys_hits)
    os_family = host_entity.OSType if host_entity.OSType else 'Linux'

    display(HTML("TI result for this IP Address"))
    display(ti_resps[ti_resps['IoC'] == ip_selection.value])
else:
    md("No IP address selected")

## Configuration

### `msticpyconfig.yaml` configuration File
You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.

To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)