--- name: extracting-iocs-from-malware-samples description: 'Extracts indicators of compromise (IOCs) from malware samples including file hashes, network indicators (IPs, domains, URLs), host artifacts (file paths, registry keys, mutexes), and behavioral patterns for threat intelligence sharing and detection rule creation. Activates for requests involving IOC extraction, threat indicator harvesting, malware indicator collection, or building detection content from samples. ' domain: cybersecurity subdomain: malware-analysis tags: - malware - IOC-extraction - threat-intelligence - indicators - detection version: 1.0.0 author: mahipal license: Apache-2.0 nist_csf: - DE.AE-02 - RS.AN-03 - ID.RA-01 - DE.CM-01 --- # Extracting IOCs from Malware Samples ## When to Use - A malware analysis (static or dynamic) is complete and actionable indicators need to be extracted for defense teams - Building blocklists for firewalls, proxies, and DNS sinkholes from analyzed samples - Creating YARA rules, Snort/Suricata signatures, or SIEM detection content from malware artifacts - Contributing to threat intelligence sharing platforms (MISP, OTX, ThreatConnect) - Tracking malware campaigns by correlating IOCs across multiple samples **Do not use** for IOCs from unverified sources without validation; false positives in blocklists can disrupt legitimate business operations. ## Prerequisites - Python 3.8+ with `iocextract`, `pefile`, `yara-python` libraries installed - Completed malware analysis report (static analysis, dynamic analysis, or reverse engineering) - Access to PCAP files, memory dumps, or sandbox reports from the analysis - MISP instance or STIX/TAXII server for structured IOC sharing - VirusTotal API key for IOC enrichment and validation - CyberChef for decoding obfuscated indicators ## Workflow ### Step 1: Extract File-Based IOCs Compute hashes and identify file metadata indicators: ```bash # Generate all standard hashes md5sum malware_sample.exe sha1sum malware_sample.exe sha256sum malware_sample.exe # Generate ssdeep fuzzy hash for similarity matching ssdeep malware_sample.exe # Generate imphash (import hash) for PE files python3 -c " import pefile pe = pefile.PE('malware_sample.exe') print(f'Imphash: {pe.get_imphash()}') " # Generate TLSH (Trend Micro Locality Sensitive Hash) python3 -c " import tlsh with open('malware_sample.exe', 'rb') as f: h = tlsh.hash(f.read()) print(f'TLSH: {h}') " # Compile file metadata IOCs python3 << 'PYEOF' import pefile import os import hashlib import datetime pe = pefile.PE("malware_sample.exe") print("FILE IOCs:") with open("malware_sample.exe", "rb") as f: data = f.read() print(f" MD5: {hashlib.md5(data).hexdigest()}") print(f" SHA-1: {hashlib.sha1(data).hexdigest()}") print(f" SHA-256: {hashlib.sha256(data).hexdigest()}") print(f" File Size: {len(data)} bytes") ts = pe.FILE_HEADER.TimeDateStamp print(f" Compile: {datetime.datetime.utcfromtimestamp(ts)} UTC") print(f" Imphash: {pe.get_imphash()}") PYEOF ``` ### Step 2: Extract Network IOCs Pull network indicators from strings, PCAP, and sandbox reports: ```python # Extract network IOCs from strings import re with open("malware_sample.exe", "rb") as f: data = f.read() # Extract ASCII and Unicode strings ascii_strings = re.findall(b'[ -~]{4,}', data) unicode_strings = re.findall(b'(?:[ -~]\x00){4,}', data) all_strings = [s.decode('ascii', errors='ignore') for s in ascii_strings] all_strings += [s.decode('utf-16-le', errors='ignore') for s in unicode_strings] # IP addresses (excluding private ranges for C2 indicators) ip_pattern = re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\b') ips = set() for s in all_strings: for ip in ip_pattern.findall(s): # Filter out private/reserved ranges octets = [int(o) for o in ip.split('.')] if octets[0] not in [10, 127, 0] and not (octets[0] == 172 and 16 <= octets[1] <= 31) and not (octets[0] == 192 and octets[1] == 168): ips.add(ip) # Domain names domain_pattern = re.compile(r'\b[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z]{2,})+\b') domains = set() for s in all_strings: for d in domain_pattern.findall(s): if not d.endswith(('.dll', '.exe', '.sys', '.com.au')): domains.add(d) # URLs url_pattern = re.compile(r'https?://[^\s<>"{}|\\^`\[\]]+') urls = set() for s in all_strings: for u in url_pattern.findall(s): urls.add(u) print("NETWORK IOCs:") print(f" IPs: {ips}") print(f" Domains: {domains}") print(f" URLs: {urls}") ``` ### Step 3: Extract Host-Based IOCs Identify file paths, registry keys, mutexes, and services: ```python # Extract host-based IOCs from sandbox report import json with open("cuckoo_report.json") as f: report = json.load(f) print("HOST IOCs:") # File paths created or modified print("\nFile Paths:") for f in report["behavior"]["summary"].get("files", []): if any(p in f.lower() for p in ["temp", "appdata", "system32", "programdata"]): print(f" [DROPPED] {f}") # Registry keys for persistence print("\nRegistry Keys:") for key in report["behavior"]["summary"].get("write_keys", []): if any(p in key.lower() for p in ["run", "service", "startup", "shell"]): print(f" [PERSIST] {key}") # Mutexes (unique to malware family) print("\nMutexes:") for mutex in report["behavior"]["summary"].get("mutexes", []): if mutex not in ["Local\\!IETld!Mutex", "RasPbFile"]: # Filter known Windows mutexes print(f" [MUTEX] {mutex}") # Created services print("\nServices:") for svc in report["behavior"]["summary"].get("started_services", []): print(f" [SERVICE] {svc}") ``` ### Step 4: Extract Network IOCs from PCAP Parse network captures for additional indicators: ```bash # Extract DNS queries from PCAP tshark -r capture.pcap -T fields -e dns.qry.name -Y "dns.flags.response == 0" | sort -u # Extract HTTP hosts and URLs tshark -r capture.pcap -T fields -e http.host -e http.request.uri -Y "http.request" | sort -u # Extract TLS server names (SNI) tshark -r capture.pcap -T fields -e tls.handshake.extensions_server_name -Y "tls.handshake.type == 1" | sort -u # Extract JA3 hashes tshark -r capture.pcap -T fields -e tls.handshake.ja3 -Y "tls.handshake.type == 1" | sort -u # Extract unique destination IPs tshark -r capture.pcap -T fields -e ip.dst -Y "ip.src == 10.0.2.15" | sort -u # Extract User-Agent strings tshark -r capture.pcap -T fields -e http.user_agent -Y "http.user_agent" | sort -u ``` ### Step 5: Defang and Validate IOCs Defang indicators for safe sharing and validate against threat intelligence: ```python # Defang IOCs for safe sharing def defang_ip(ip): return ip.replace(".", "[.]") def defang_url(url): return url.replace("http", "hxxp").replace(".", "[.]") def defang_domain(domain): return domain.replace(".", "[.]") # Validate IOCs against VirusTotal import requests VT_API_KEY = "your_api_key" def check_vt_ip(ip): resp = requests.get(f"https://www.virustotal.com/api/v3/ip_addresses/{ip}", headers={"x-apikey": VT_API_KEY}) data = resp.json() stats = data["data"]["attributes"]["last_analysis_stats"] return stats["malicious"] def check_vt_domain(domain): resp = requests.get(f"https://www.virustotal.com/api/v3/domains/{domain}", headers={"x-apikey": VT_API_KEY}) data = resp.json() stats = data["data"]["attributes"]["last_analysis_stats"] return stats["malicious"] # Validate each IOC for ip in ips: detections = check_vt_ip(ip) print(f" {defang_ip(ip)} - VT: {detections} detections") ``` ### Step 6: Export IOCs in Standard Formats Generate structured IOC outputs for sharing and ingestion: ```python # Export as STIX 2.1 bundle from stix2 import Indicator, Bundle, Malware, Relationship import datetime indicators = [] # File hash indicator indicators.append(Indicator( name="Malware SHA-256 Hash", pattern=f"[file:hashes.'SHA-256' = '{sha256_hash}']", pattern_type="stix", valid_from=datetime.datetime.now(datetime.timezone.utc), labels=["malicious-activity"] )) # IP indicator for ip in ips: indicators.append(Indicator( name=f"C2 IP Address {ip}", pattern=f"[ipv4-addr:value = '{ip}']", pattern_type="stix", valid_from=datetime.datetime.now(datetime.timezone.utc), labels=["malicious-activity"] )) # Domain indicator for domain in domains: indicators.append(Indicator( name=f"C2 Domain {domain}", pattern=f"[domain-name:value = '{domain}']", pattern_type="stix", valid_from=datetime.datetime.now(datetime.timezone.utc), labels=["malicious-activity"] )) bundle = Bundle(objects=indicators) with open("iocs_stix.json", "w") as f: f.write(bundle.serialize(pretty=True)) # Export as CSV for SIEM ingestion import csv with open("iocs.csv", "w", newline="") as f: writer = csv.writer(f) writer.writerow(["type", "value", "context", "confidence"]) writer.writerow(["sha256", sha256_hash, "malware_sample", "high"]) for ip in ips: writer.writerow(["ipv4", ip, "c2_server", "high"]) for domain in domains: writer.writerow(["domain", domain, "c2_domain", "high"]) for url in urls: writer.writerow(["url", url, "c2_url", "high"]) ``` ## Key Concepts | Term | Definition | |------|------------| | **IOC (Indicator of Compromise)** | Forensic artifact observed in a network or system that indicates a potential intrusion: hashes, IPs, domains, file paths, registry keys | | **Defanging** | Modifying IOCs to prevent accidental activation (e.g., replacing dots with [.] in URLs and IPs for safe sharing in reports) | | **Imphash** | MD5 hash of the import table functions in a PE file; samples from the same malware family often share the same imphash | | **STIX/TAXII** | Structured Threat Information Expression / Trusted Automated Exchange; standards for encoding and transmitting threat intelligence | | **JA3/JA3S** | TLS client/server fingerprint based on ClientHello/ServerHello parameters; identifies specific malware families by their TLS implementation | | **Fuzzy Hashing (ssdeep)** | Context-triggered piecewise hashing that identifies similar files even with minor modifications; useful for malware variant detection | | **MISP** | Malware Information Sharing Platform; open-source threat intelligence platform for collecting, storing, and sharing IOCs | ## Tools & Systems - **iocextract (Python)**: Automated IOC extraction library supporting IPs, URLs, domains, hashes, and YARA rules from text - **MISP**: Open-source threat intelligence sharing platform for structured IOC management and distribution - **CyberChef**: Web-based tool for decoding, decrypting, and transforming data useful for deobfuscating encoded IOCs - **tshark**: Command-line network protocol analyzer for extracting network IOCs from PCAP files - **VirusTotal**: Online service for validating and enriching IOCs with community detection results and threat intelligence ## Common Scenarios ### Scenario: Building a Comprehensive IOC Package from a Ransomware Sample **Context**: A ransomware incident requires rapid IOC extraction for blocking across the enterprise while the full investigation continues. Multiple data sources are available: the sample binary, PCAP from network monitoring, and a Cuckoo sandbox report. **Approach**: 1. Compute all file hashes (MD5, SHA-1, SHA-256, imphash, ssdeep) for the ransomware binary and any dropped files 2. Extract network IOCs from strings in the binary (hardcoded C2 addresses) 3. Parse the PCAP for DNS queries, HTTP requests, and TLS SNI fields 4. Extract host IOCs from the sandbox report (file paths, registry keys, mutexes, ransom note filenames) 5. Validate all network IOCs against VirusTotal to confirm malicious status and check for known associations 6. Defang all indicators and compile into STIX 2.1 format for sharing and CSV for SIEM ingestion 7. Submit to MISP event for organizational and community sharing **Pitfalls**: - Including IP addresses of legitimate CDNs or cloud services without validating context (e.g., AWS IPs used for hosting, not inherently malicious) - Not defanging URLs and IPs in reports, leading to accidental clicks or DNS resolution - Extracting strings from packed binaries (IOCs from packed samples are unreliable; unpack first) - Forgetting to include dropped file hashes (the initial dropper and the final payload are separate IOCs) ## Output Format ``` IOC EXTRACTION REPORT ====================== Sample: ransomware.exe Analysis Date: 2025-09-15 Analyst: [Name] FILE INDICATORS SHA-256: e3b0c44298fc1c149afbf4c8996fb924... SHA-1: da39a3ee5e6b4b0d3255bfef95601890afd80709 MD5: d41d8cd98f00b204e9800998ecf8427e Imphash: a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6 ssdeep: 3072:kJh3bN7fY+aUkJh3bN7fY+aU:kJh3R7aUkJh3R7aU NETWORK INDICATORS C2 IPs: 185.220.101[.]42, 91.215.85[.]17 C2 Domains: update.malicious[.]com, backup.evil[.]net C2 URLs: hxxps://update.malicious[.]com/gate.php hxxps://backup.evil[.]net/gate.php JA3 Hash: a0e9f5d64349fb13191bc781f81f42e1 User-Agent: Mozilla/5.0 (compatible; MSIE 10.0) HOST INDICATORS File Paths: C:\Users\Public\svchost.exe C:\Users\%USER%\AppData\Local\Temp\payload.dll C:\Users\%USER%\Desktop\README_DECRYPT.txt Registry Keys: HKCU\Software\Microsoft\Windows\CurrentVersion\Run\WindowsUpdate Mutexes: Global\CryptLocker_2025_Q3 Services: FakeWindowsUpdate CONFIDENCE ASSESSMENT High Confidence: SHA-256, C2 IPs (validated via VT), Mutexes Medium Confidence: Domains (could be compromised legitimate sites) Low Confidence: User-Agent (common string, high false positive risk) EXPORT FILES stix_bundle.json - STIX 2.1 format for TIP ingestion iocs.csv - Flat CSV for SIEM blocklist import yara_rule.yar - YARA detection rule ```