# Title: msticpy - VirusTotal Lookup

## Disclaimer and Acknowledgements:
The code in this module is offered as a convenience wrapper for the VirusTotal API based on the [public documentation](https://www.virustotal.com/en/documentation/public-api/). The code does not originate from VirusTotal, nor is it endorsed by them. I'd like thank them for
- Wonderfully clear documention and examples
- Granting me extra querying capacity for my account for testing

You must have msticpy installed to run this notebook:
```
!pip install --upgrade msticpy
```

## New Features ##
This is quite and old notebook and some developments largely supercede this
component.
- Virus Total queries have been integrated into the core TILookup functionality in MSTICPy
  See the [TIProviders notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/TIProviders.ipynb)
  and the [TIProviders documentation](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html)
  for more details
  
- VirusTotal V3 API - VT have release a new version of their API which allows graph
  traversal to get information about how malware and actors are linked.
  See the [VT3Lookup notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/VTLookupV3.ipynb)
  for more details

## Introduction
This class allows you to submit Indicators of Compromise (IoC) to VirusTotal and receive and process the content of the response. You can submit a single item or a set of items in a column of a pandas DataFrame.



VirusTotal supports the following IoC Types:
- FileHash
- URL
- IP Address (v4)
- DNS Domain

The first two of these result in full reports of malicious content from scans. The IP Address and DNS items provide secondary lookup - e.g. if IP Address 111.222.3.5 or www.evil.net is linked to a positive (malicious) report for a URL, the latter report will be returned in the results. VT does not report directly on the reputation of IP addresses or DNS domains.

## Virus Total Lookup
To use this module need an API key from virus total, which you can obtain here: https://www.virustotal.com/.

Note that VT throttles requests for free API keys to 4/minute. If you are unable to process the entire data set, try splitting it and submitting smaller chunks.

**Things to note:**
- Virus Total lookups include file hashes, domains, IP addresses and URLs.
- The returned data is slightly different depending on the input type
- The VTLookup class tries to screen input data to prevent pointless lookups. E.g.:
  - Only public IP Addresses will be submitted (no loopback, private address space, etc.)
  - URLs with only local (unqualified) host parts will not be submitted.
  - Domain names that are unqualified will not be submitted.
  - Hash-like strings (e.g 'AAAAAAAAAAAAAAAAAA') that do not appear to have enough entropy to be a hash will not be submitted.
  - If submitted in a batch (i.e. using a DataFrame as input) duplicate IoCs are not submitted. Duplicates will be given the results from the original lookip 



<a id='contents'></a>
## Table of Contents
- [VirusTotal API Key](#api_key)
- [Looking up Single IoC](#single_ioc_lookup)
- [Interpreting the Output](#interpreting_output)
- [Using a DataFrame as input](#dataframe_input)



In [1]:
# Imports
import sys
MIN_REQ_PYTHON = (3,6)
if sys.version_info < MIN_REQ_PYTHON:
    print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')
    print('or later is selected as the active kernel.')
    sys.exit("Python %s.%s or later is required.\n" % MIN_REQ_PYTHON)


from IPython.display import display
import pandas as pd

import msticpy.sectools as sectools
import msticpy.nbtools as mas
from msticpy.sectools import VTLookup, IoCExtract

<a id='api_key'></a>[Contents](#contents)
## You will need a VirusTotal API key
You will get more detailed results if you have a private API key but you can get a lot of good information using the public API and a free API key. You are however limited in the number of requests you can make.

In [2]:
# Enter your VT Key here
vt_key = mas.GetEnvironmentKey(env_var='VT_API_KEY',
                           help_str='To obtain an API key sign up here https://www.virustotal.com/',
                           prompt='Virus Total API key:')
vt_key.display()

HBox(children=(Text(value='***REMOVED***', description='Virâ€¦

In [7]:
# Create an instance of the class

vt_lookup = sectools.VTLookup(vt_key.value, verbosity=2)

<a id='single_ioc_lookups'></a>[Contents](#contents)
## Looking up Single IoCs
```
Signature: vt_lookup.lookup_ioc(observable: str, ioc_type: str, output: str = 'dict')
Docstring:
Look up and single IoC observable.

    :param observable: The observable value
    :param ioc_type: The IoC Type (see 'supported_ioc_types' attribute)
    :param output='dict': Output results as a dictionary (or list of dicts)
        if output is any other value the result will be returned in a
        Pandas DataFrame

    Returns:
        list{dict}: if output == 'dict'
        pd.DataFrame: otherwise
```

In [4]:
# Default output type for single item is a dict
vt_lookup.lookup_ioc(observable='90.156.201.97', ioc_type='ipv4')

{'Observable': '90.156.201.97',
 'IoCType': 'ipv4',
 'Status': 'Success',
 'ResponseCode': 1,
 'RawResponse': '{"undetected_downloaded_samples": [{"date": "2019-03-12 19:02:12", "positives": 0, "total": 46, "sha256": "5c51cf182781dbd3fdbe3fe8a6e01742ab02729cf9c4c2450f3699ab15fd7ba9"}, {"date": "2018-12-08 12:29:43", "positives": 0, "total": 70, "sha256": "1c879f33fdfdad829682b3572652178b4d8344d6b1001fabafea2e6897cd7c5a"}, {"date": "2019-02-27 17:31:43", "positives": 0, "total": 57, "sha256": "78342a0905a72ce44da083dcb5d23b8ea0c16992ba2a82eece97e033d76ba3d3"}, {"date": "2019-02-11 13:06:10", "positives": 0, "total": 71, "sha256": "0f774764181a1d850141bf64393228b7acdb6261844f0165a78839f549d7bcce"}, {"date": "2019-02-13 19:22:34", "positives": 0, "total": 55, "sha256": "13e5f2a6c4bbed674eea0e0bff9a78fc9b38a5b1f83fb69552b4673fe796e8c0"}, {"date": "2019-02-13 06:12:51", "positives": 0, "total": 56, "sha256": "7aada93462e39cd1370151b2dfe6254328d2b8e16dc927cb56689fc1334ee86c"}, {"date": "2019

### DataFrame output can be a cleaner than a dict
Note that re-using the same class for multiple lookups accumulates the results in the the class results DataFrame

In [34]:
# DataFrame output can be a cleaner
vt_lookup = sectools.VTLookup(vt_key.value, verbosity=2)

print('IP Lookup')
display(vt_lookup.lookup_ioc(observable='90.156.201.97', 
                             ioc_type='ipv4', output='dataframe'))

print('\n+ MD5 Hash Lookup')
display(vt_lookup.lookup_ioc(observable='7657fcb7d772448a6d8504e4b20168b8', 
                             ioc_type='md5_hash', output='dataframe'))
    
print('\n+ URL Lookup')

url ='http://club-fox.ru/img/www.loginalibaba.com/alibaba/alibaba/login.alibaba.com.php?email=biuro'
vt_lookup.lookup_ioc(observable=url, ioc_type='url', output='dataframe')

Unnamed: 0,Observable,IoCType,Status,ResponseCode,RawResponse,Resource,SourceIndex,VerboseMsg,Resource.1,ScanId,Permalink,Positives,MD5,SHA1,SHA256,ResolvedDomains,ResolvedIPs,DetectedUrls
0,90.156.201.97,ipv4,Success,1,"{""asn"": ""25532"", ""undetected_downloaded_sample...",,0,IP address in dataset,,,,350,,,,"0-1000v.ru, 00004.ru, 01sasha.ru, 027.ru, 03ma...",,"http://remont-iphone-spb.com/, http://www.prov..."


Unnamed: 0,Observable,IoCType,Status,ResponseCode,RawResponse,Resource,SourceIndex,VerboseMsg,Resource.1,ScanId,Permalink,Positives,MD5,SHA1,SHA256,ResolvedDomains,ResolvedIPs,DetectedUrls
0,90.156.201.97,ipv4,Success,1,"{""asn"": ""25532"", ""undetected_downloaded_sample...",,0,IP address in dataset,,,,350,,,,"0-1000v.ru, 00004.ru, 01sasha.ru, 027.ru, 03ma...",,"http://remont-iphone-spb.com/, http://www.prov..."
1,7657fcb7d772448a6d8504e4b20168b8,md5_hash,Success,1,"{""scans"": {""Bkav"": {""detected"": true, ""version...",7657fcb7d772448a6d8504e4b20168b8,0,"Scan finished, information embedded",7657fcb7d772448a6d8504e4b20168b8,54bc950d46a0d1aa72048a17c8275743209e6c17bdacfc...,https://www.virustotal.com/file/54bc950d46a0d1...,59,7657fcb7d772448a6d8504e4b20168b8,84c7201f7e59cb416280fd69a2e7f2e349ec8242,54bc950d46a0d1aa72048a17c8275743209e6c17bdacfc...,,,


URL Lookup


Unnamed: 0,Observable,IoCType,Status,ResponseCode,RawResponse,Resource,SourceIndex,VerboseMsg,Resource.1,ScanId,Permalink,Positives,MD5,SHA1,SHA256,ResolvedDomains,ResolvedIPs,DetectedUrls
0,90.156.201.97,ipv4,Success,1,"{""asn"": ""25532"", ""undetected_downloaded_sample...",,0,IP address in dataset,,,,350,,,,"0-1000v.ru, 00004.ru, 01sasha.ru, 027.ru, 03ma...",,"http://remont-iphone-spb.com/, http://www.prov..."
1,7657fcb7d772448a6d8504e4b20168b8,md5_hash,Success,1,"{""scans"": {""Bkav"": {""detected"": true, ""version...",7657fcb7d772448a6d8504e4b20168b8,0,"Scan finished, information embedded",7657fcb7d772448a6d8504e4b20168b8,54bc950d46a0d1aa72048a17c8275743209e6c17bdacfc...,https://www.virustotal.com/file/54bc950d46a0d1...,59,7657fcb7d772448a6d8504e4b20168b8,84c7201f7e59cb416280fd69a2e7f2e349ec8242,54bc950d46a0d1aa72048a17c8275743209e6c17bdacfc...,,,
2,http://club-fox.ru/img/www.loginalibaba.com/al...,url,Success,1,"{""scan_id"": ""700994c09c45224fd5d6cb938e043ce64...",http://club-fox.ru/img/www.loginalibaba.com/al...,0,"Scan finished, scan information embedded in th...",http://club-fox.ru/img/www.loginalibaba.com/al...,700994c09c45224fd5d6cb938e043ce648baa2231401e7...,https://www.virustotal.com/url/700994c09c45224...,12,,,,,,


<a id='interpreting_output'></a>[Contents](#contents)
## Interpreting the Output
Columns in the output dataframe are as follows:
 - Observable - The IoC observable submitted
 - IoCType - the IoC type
 - Status - the status of the submission request
 - ResponseCode - the VT response code
 - RawResponse - the entire raw json response
 - Resource - VT Resource
 - SourceIndex - The index of the Observable in the source DataFrame. You can use this to rejoin to your original data.
 - VerboseMsg - VT Verbose Message
 - ScanId - VT Scan ID if any
 - Permalink - VT Permanent URL describing the resource
 - Positives - If this is not zero, it indicates the number of malicious reports that VT holds for this observable.
 - MD5 - The MD5 hash, if any
 - SHA1 - The MD5 hash, if any
 - SHA256 - The MD5 hash, if any
 - ResolvedDomains - In the case of IP Addresses, this contains a list of all domains that resolve to this IP address
 - ResolvedIPs - In the case Domains, this contains a list of all IP addresses resolved from the domain.
 - DetectedUrls - Any malicious URLs associated with the observable.

In [19]:
display(pd.DataFrame(vt_lookup.results.loc[0].T))
print(f'{len(vt_lookup.results.loc[0].ResolvedDomains)} resolved domains')
print('Showing first 10')
display(vt_lookup.results.loc[0].ResolvedDomains.split(',')[0:10])

print(f'{len(vt_lookup.results.loc[0].DetectedUrls)} detected urls')
print('Showing first 10 (Don\'t click on any of these!)')
display(vt_lookup.results.loc[0].DetectedUrls.split(',')[0:10])

Unnamed: 0,0
Observable,90.156.201.97
IoCType,ipv4
Status,Success
ResponseCode,1
RawResponse,"{""undetected_downloaded_samples"": [{""date"": ""2..."
Resource,
SourceIndex,0
VerboseMsg,IP address in dataset
Resource,
ScanId,


14456 resolved domains
Showing first 10


['0-1000v.ru',
 ' 00004.ru',
 ' 01sasha.ru',
 ' 027.ru',
 ' 03magnet.com',
 ' 03magnet.ru',
 ' 04gaz.ru',
 ' 0525.ru',
 ' 0987654321.ru',
 ' 0notole.ru']

4870 detected urls
Showing first 10 (Don't click on any of these!)


['http://remont-iphone-spb.com/',
 ' http://www.provetom.ru/art/art_2.htm',
 ' http://gubino.net/',
 ' http://thar.ru/',
 ' http://alliance-pravo.com/',
 ' http://ventkanal.ru/kwdl38g',
 ' http://autolombard.club/',
 ' http://moscowbmw.ru/',
 ' http://belowtheweb.ru/avia/300%C3%97500/images/pikz.zip',
 ' http://www.maxsev.ru/']

### IoC Types Available
There are 4 basic IoC types used by Virus Total. Hashes of all types (include SHA256 Authenticode) are covered by the 'file' type.

In [26]:
# Types that you need to supply to the lookup calls
VTLookup._SUPPORTED_INPUT_TYPES

['ipv4', 'dns', 'url', 'md5_hash', 'sha1_hash', 'sh256_hash']

In [27]:
# How these map to VT lookup types
VTLookup._VT_TYPE_MAP

{'ipv4': 'ip-address',
 'ipv6': None,
 'dns': 'domain',
 'url': 'url',
 'md5_hash': 'file',
 'sha1_hash': 'file',
 'sh256_hash': 'file'}

<a id='dataframe_input'></a>[Contents](#contents)
## Input from a DataFrame

**WARNING** The VirusTotal Public API allows a maximum of 4 requests a minute. If you start seeing HTTP Error 403, you've probably hit this limit

API Signature
```
vt_lookup.lookup_iocs(
    ['data: pandas.core.frame.DataFrame', "src_col: str = 'Observable'", "type_col: str = 'IoCType'", "src_index_col: str = 'SourceIndex'", '**kwargs'],
) -> pandas.core.frame.DataFrame
Docstring:
lookup_iocs: main lookup method.

Tries to retrieve results for IoC observables in the source dataframe.

    :param data: dataframe containing the observables to search for
    :param src_col: the column name that contains the observable data
        (one item per row)
    :param type_col: the column name containing the observable type
    :param source_index: the name of the column to use as source index. If not
        specified this defaults to 'SourceIndex'. If this (or the supplied value)
        is not in the source dataframe the index of the source dataframe will
        be used. This is retained in the output so that you can join the results
        back to the original data.
    :param kwargs: key/value pairs of additional mappings to supported IoC type names
        e.g. ipv4='ipaddress', url='httprequest'. This allows you to specify custom
        mappings when the source data is tagged with different names.

Returns:
    pd.DataFrame: VT Results

See supported_ioc_types attribute for a list of valid target types.
Not all of these types are supported by VirusTotal. See ioc_vt_type_mapping for
current mappings. Types mapped to None will not be submitted to VT.

For urls a full http request can be submitted, query string and fragments will be
dropped before submitting. Other supported protocols are ftp, telnet, ldap, file
For files MD5, SHA1 and SHA256 hashes are supported.
For IP addresses only dotted IPv4 addresses are supported.
```

#### Load test data and extract some IoCs from it

In [3]:
# Load test data
process_tree = pd.read_csv('data/process_tree.csv')
process_tree[['CommandLine']].head()

Unnamed: 0,CommandLine
0,.\ftp -s:C:\RECYCLER\xxppyy.exe
1,.\reg not /domain:everything that /sid:shines...
2,"cmd /c ""systeminfo && systeminfo"""
3,.\rundll32 /C 42424.exe
4,.\rundll32 /C c:\users\MSTICAdmin\42424.exe


In [8]:
# Use our Regex IoC extractor to pull out things that look like IoCs from the Commandline
ioc_extractor = IoCExtract()
vt_lookup = VTLookup(vt_key.value, verbosity=2)
output_df = ioc_extractor.extract(data=process_tree, 
                                  columns=['CommandLine'], 
                                  ioc_types=vt_lookup.supported_ioc_types)
output_df

Unnamed: 0,IoCType,Observable,SourceIndex
0,dns,tsetup.1.exe,9
1,dns,tsetup.1.0.14.exe,9
2,dns,tsetup.1.0.14.tmp,9
3,dns,doubleextension.pdf.exe,20
4,url,http://server/file.sct,31
5,dns,server,31
6,url,http://somedomain/best-kitten-names-1.jpg',37
7,dns,somedomain,37
8,md5_hash,aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,40
9,md5_hash,aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,41


### Submit these to VirusTotal
Note that most of these the IoC observables found by a simple regex extraction were rejected before submitting to VT. As well as checking for duplicates this module also filters out things like 
- loopback/private IPs
- unqualified and unresolvable domain names
- strings of hex characters that are probably not hashes 

In [6]:
vt_lookup = VTLookup(vt_key.value, verbosity=2)

# Submit a subset of the found IoCs (ignore dns since a lot of )
vt_results = vt_lookup.lookup_iocs(data=output_df, 
                                   type_col='IoCType', 
                                   src_col='Observable')

display(vt_results)

Submitting observables: "1.2.3.4", type "ipv4" to VT. (Source index 78)
Invalid observable format: "127.0.0.1", type "ipv4", status: IP is private address - skipping. (Source index 102)
Invalid observable format: "tsetup.1.exe", type "dns", status: Domain not resolvable - skipping. (Source index 9)
Invalid observable format: "tsetup.1.0.14.exe", type "dns", status: Domain not resolvable - skipping. (Source index 9)
Invalid observable format: "tsetup.1.0.14.tmp", type "dns", status: Domain not resolvable - skipping. (Source index 9)
Invalid observable format: "doubleextension.pdf.exe", type "dns", status: Domain not resolvable - skipping. (Source index 20)
Invalid observable format: "server", type "dns", status: Observable does not match expected pattern for dns - skipping. (Source index 31)
Invalid observable format: "somedomain", type "dns", status: Observable does not match expected pattern for dns - skipping. (Source index 37)
Invalid observable format: "badguyserver", type "dns", s

Unnamed: 0,Observable,IoCType,Status,ResponseCode,RawResponse,Resource,SourceIndex,VerboseMsg,Resource.1,ScanId,Permalink,Positives,MD5,SHA1,SHA256,ResolvedDomains,ResolvedIPs,DetectedUrls
0,1.2.3.4,ipv4,Success,1.0,"{""asn"": ""15169"", ""undetected_referrer_samples""...",,78,IP address in dataset,,,,162.0,,,,"%2a.netaccess-india.com, 0-9.dgjtest030-pp-qm-...",,"http://1.2.3.4:8347/, http://1.2.3.4/, http://..."
1,127.0.0.1,ipv4,IP is private address,,,,102,,,,,,,,,,,
2,tsetup.1.exe,dns,Domain not resolvable,,,,9,,,,,,,,,,,
3,tsetup.1.0.14.exe,dns,Domain not resolvable,,,,9,,,,,,,,,,,
4,tsetup.1.0.14.tmp,dns,Domain not resolvable,,,,9,,,,,,,,,,,
5,doubleextension.pdf.exe,dns,Domain not resolvable,,,,20,,,,,,,,,,,
6,server,dns,Observable does not match expected pattern for...,,,,31,,,,,,,,,,,
7,somedomain,dns,Observable does not match expected pattern for...,,,,37,,,,,,,,,,,
8,badguyserver,dns,Observable does not match expected pattern for...,,,,46,,,,,,,,,,,
9,badguyserver,dns,Observable does not match expected pattern for...,,,,47,,,,,,,,,,,
