--- name: malware-forensics description: | Analyze malware samples for forensic investigation. Use when investigating malware infections, determining malware capabilities, extracting IOCs, or understanding attack techniques. Supports static and dynamic analysis of executables, scripts, and documents. license: Apache-2.0 compatibility: | - Python 3.9+ - Optional: yara-python, pefile, oletools, ssdeep metadata: author: SherifEldeeb version: "1.0.0" category: forensics --- # Malware Forensics Comprehensive malware forensics skill for analyzing malicious software samples. Enables static and dynamic analysis, extraction of indicators of compromise, attribution research, and documentation of malware capabilities for incident response and threat intelligence. ## Capabilities - **Static Analysis**: Analyze malware without execution (strings, headers, imports) - **PE Analysis**: Parse Windows executables, DLLs, and drivers - **Document Analysis**: Analyze malicious Office documents and PDFs - **Script Analysis**: Analyze malicious scripts (PowerShell, VBA, JavaScript) - **IOC Extraction**: Extract IPs, domains, URLs, hashes, and other indicators - **YARA Scanning**: Scan samples with YARA rules for identification - **String Analysis**: Extract and categorize strings from samples - **Behavior Analysis**: Document observed malware behavior - **Unpacking Support**: Identify and document packed samples - **Attribution Analysis**: Link samples to threat actors or campaigns ## Quick Start ```python from malware_forensics import MalwareAnalyzer, PEAnalyzer, IOCExtractor # Analyze sample analyzer = MalwareAnalyzer("/samples/malware.exe") report = analyzer.analyze() # Extract IOCs extractor = IOCExtractor("/samples/malware.exe") iocs = extractor.extract_all() # Scan with YARA matches = analyzer.yara_scan("/rules/malware.yar") ``` ## Usage ### Task 1: PE File Analysis **Input**: Windows executable (EXE, DLL, SYS) **Process**: 1. Parse PE headers 2. Analyze imports/exports 3. Check for anomalies 4. Extract resources 5. Calculate hashes **Output**: PE analysis report **Example**: ```python from malware_forensics import PEAnalyzer # Initialize PE analyzer analyzer = PEAnalyzer("/samples/suspicious.exe") # Get basic info info = analyzer.get_basic_info() print(f"File: {info.filename}") print(f"Size: {info.size}") print(f"MD5: {info.md5}") print(f"SHA256: {info.sha256}") print(f"SSDeep: {info.ssdeep}") print(f"Type: {info.file_type}") # Get PE headers headers = analyzer.get_headers() print(f"Machine: {headers.machine}") print(f"Timestamp: {headers.timestamp}") print(f"Subsystem: {headers.subsystem}") print(f"Entry point: 0x{headers.entry_point:x}") print(f"Image base: 0x{headers.image_base:x}") # Get sections sections = analyzer.get_sections() for section in sections: print(f"Section: {section.name}") print(f" Virtual size: {section.virtual_size}") print(f" Raw size: {section.raw_size}") print(f" Entropy: {section.entropy}") print(f" MD5: {section.md5}") # Get imports imports = analyzer.get_imports() for dll, functions in imports.items(): print(f"Import: {dll}") for func in functions: print(f" - {func}") # Get exports exports = analyzer.get_exports() for export in exports: print(f"Export: {export.name} @ {export.ordinal}") # Detect anomalies anomalies = analyzer.detect_anomalies() for a in anomalies: print(f"ANOMALY: {a.type}") print(f" Description: {a.description}") print(f" Severity: {a.severity}") # Get resources resources = analyzer.get_resources() for resource in resources: print(f"Resource: {resource.type}/{resource.name}") print(f" Size: {resource.size}") print(f" Language: {resource.language}") # Check for packing packing = analyzer.detect_packing() print(f"Packed: {packing.is_packed}") print(f"Packer: {packing.packer_name}") print(f"Confidence: {packing.confidence}") # Generate report analyzer.generate_report("/evidence/pe_analysis.html") ``` ### Task 2: String Analysis **Input**: Malware sample **Process**: 1. Extract ASCII/Unicode strings 2. Categorize by type 3. Identify suspicious strings 4. Extract encoded strings 5. Document findings **Output**: String analysis with categorization **Example**: ```python from malware_forensics import StringAnalyzer # Initialize string analyzer analyzer = StringAnalyzer("/samples/malware.exe") # Extract all strings strings = analyzer.extract_all(min_length=4) print(f"Total strings: {len(strings)}") # Get strings by category categorized = analyzer.categorize() print(f"URLs: {len(categorized.urls)}") for url in categorized.urls: print(f" {url}") print(f"IPs: {len(categorized.ips)}") for ip in categorized.ips: print(f" {ip}") print(f"Domains: {len(categorized.domains)}") for domain in categorized.domains: print(f" {domain}") print(f"Registry keys: {len(categorized.registry)}") for reg in categorized.registry: print(f" {reg}") print(f"File paths: {len(categorized.file_paths)}") for path in categorized.file_paths: print(f" {path}") # Find suspicious strings suspicious = analyzer.find_suspicious() for s in suspicious: print(f"SUSPICIOUS: {s.value}") print(f" Category: {s.category}") print(f" Reason: {s.reason}") # Decode encoded strings decoded = analyzer.decode_strings() for d in decoded: print(f"Encoded: {d.encoded[:50]}...") print(f" Encoding: {d.encoding}") print(f" Decoded: {d.decoded}") # Find strings with XOR patterns xor_strings = analyzer.find_xor_encoded() for x in xor_strings: print(f"XOR key 0x{x.key:02x}: {x.decoded}") # Export strings analyzer.export("/evidence/strings.txt") analyzer.export_json("/evidence/strings.json") ``` ### Task 3: Document Analysis **Input**: Malicious document (Office, PDF) **Process**: 1. Parse document structure 2. Extract macros/scripts 3. Analyze embedded objects 4. Detect exploits 5. Extract payloads **Output**: Document analysis report **Example**: ```python from malware_forensics import DocumentAnalyzer # Analyze Office document analyzer = DocumentAnalyzer("/samples/malicious.docx") # Get document info info = analyzer.get_info() print(f"Format: {info.format}") print(f"Created: {info.created}") print(f"Modified: {info.modified}") print(f"Author: {info.author}") print(f"Has macros: {info.has_macros}") print(f"Has embedded: {info.has_embedded}") # Extract macros macros = analyzer.extract_macros() for macro in macros: print(f"Macro: {macro.name}") print(f" Type: {macro.type}") print(f" Code preview: {macro.code[:200]}...") print(f" Suspicious: {macro.is_suspicious}") # Analyze VBA code vba_analysis = analyzer.analyze_vba() for finding in vba_analysis.findings: print(f"VBA Finding: {finding.type}") print(f" Description: {finding.description}") print(f" Code: {finding.code_snippet}") # Get auto-execute triggers triggers = analyzer.get_auto_triggers() for trigger in triggers: print(f"Trigger: {trigger.name}") print(f" Type: {trigger.trigger_type}") # Extract embedded objects embedded = analyzer.extract_embedded("/evidence/embedded/") for obj in embedded: print(f"Embedded: {obj.filename}") print(f" Type: {obj.content_type}") print(f" SHA256: {obj.sha256}") # Detect exploits exploits = analyzer.detect_exploits() for exploit in exploits: print(f"EXPLOIT: {exploit.cve}") print(f" Description: {exploit.description}") print(f" Confidence: {exploit.confidence}") # PDF-specific analysis if info.format == "PDF": pdf_analysis = analyzer.analyze_pdf_structure() print(f"JavaScript: {pdf_analysis.has_javascript}") print(f"OpenAction: {pdf_analysis.has_openaction}") print(f"Embedded files: {pdf_analysis.embedded_files}") # Generate report analyzer.generate_report("/evidence/document_analysis.html") ``` ### Task 4: Script Analysis **Input**: Malicious script file **Process**: 1. Identify script type 2. Deobfuscate code 3. Analyze behavior 4. Extract IOCs 5. Document capabilities **Output**: Script analysis report **Example**: ```python from malware_forensics import ScriptAnalyzer # Analyze script analyzer = ScriptAnalyzer("/samples/malicious.ps1") # Get script info info = analyzer.get_info() print(f"Type: {info.script_type}") print(f"Size: {info.size}") print(f"Encoding: {info.encoding}") print(f"Obfuscated: {info.is_obfuscated}") # Deobfuscate script deobfuscated = analyzer.deobfuscate() print(f"Deobfuscation stages: {deobfuscated.stages}") print(f"Final code preview: {deobfuscated.final_code[:500]}...") # Analyze PowerShell-specific features if info.script_type == "PowerShell": ps_analysis = analyzer.analyze_powershell() print(f"Download cradles: {ps_analysis.download_cradles}") print(f"Encoded commands: {ps_analysis.encoded_commands}") print(f"Bypass techniques: {ps_analysis.bypass_techniques}") # Get suspicious patterns patterns = analyzer.find_suspicious_patterns() for p in patterns: print(f"Pattern: {p.name}") print(f" Description: {p.description}") print(f" Code: {p.code_snippet}") print(f" MITRE: {p.mitre_technique}") # Analyze JavaScript if info.script_type == "JavaScript": js_analysis = analyzer.analyze_javascript() print(f"Eval calls: {js_analysis.eval_calls}") print(f"Document.write: {js_analysis.document_writes}") print(f"External requests: {js_analysis.external_requests}") # Extract IOCs iocs = analyzer.extract_iocs() print(f"URLs: {iocs.urls}") print(f"Domains: {iocs.domains}") print(f"IPs: {iocs.ips}") # Get execution flow flow = analyzer.analyze_execution_flow() for step in flow: print(f"Step {step.order}: {step.description}") print(f" Action: {step.action}") # Generate report analyzer.generate_report("/evidence/script_analysis.html") ``` ### Task 5: IOC Extraction **Input**: Malware sample **Process**: 1. Extract network IOCs 2. Extract file IOCs 3. Extract registry IOCs 4. Deduplicate and validate 5. Export in multiple formats **Output**: IOC collection **Example**: ```python from malware_forensics import IOCExtractor # Initialize extractor extractor = IOCExtractor("/samples/malware.exe") # Extract all IOCs iocs = extractor.extract_all() # Network IOCs print(f"URLs ({len(iocs.urls)}):") for url in iocs.urls: print(f" {url.value}") print(f" Context: {url.context}") print(f" Confidence: {url.confidence}") print(f"Domains ({len(iocs.domains)}):") for domain in iocs.domains: print(f" {domain.value}") print(f"IPs ({len(iocs.ips)}):") for ip in iocs.ips: print(f" {ip.value}") print(f" Type: {ip.ip_type}") # C2, download, etc. # File IOCs print(f"File paths ({len(iocs.file_paths)}):") for path in iocs.file_paths: print(f" {path.value}") print(f"File hashes ({len(iocs.hashes)}):") for h in iocs.hashes: print(f" {h.algorithm}: {h.value}") # Registry IOCs print(f"Registry keys ({len(iocs.registry_keys)}):") for reg in iocs.registry_keys: print(f" {reg.value}") # Mutexes print(f"Mutexes ({len(iocs.mutexes)}):") for mutex in iocs.mutexes: print(f" {mutex.value}") # Validate IOCs validated = extractor.validate_iocs(iocs) print(f"Valid IOCs: {validated.valid_count}") print(f"Invalid IOCs: {validated.invalid_count}") # Enrich IOCs enriched = extractor.enrich_iocs( iocs, sources=["virustotal", "threatfox", "urlhaus"] ) # Export IOCs extractor.export_csv("/evidence/iocs.csv") extractor.export_json("/evidence/iocs.json") extractor.export_stix("/evidence/iocs.stix") extractor.export_misp("/evidence/iocs.misp.json") ``` ### Task 6: YARA Scanning **Input**: Malware sample(s) and YARA rules **Process**: 1. Compile YARA rules 2. Scan samples 3. Collect matches 4. Document findings 5. Generate report **Output**: YARA scan results **Example**: ```python from malware_forensics import YARAScanner # Initialize scanner scanner = YARAScanner() # Add rule files scanner.add_rules("/rules/malware_families.yar") scanner.add_rules("/rules/packers.yar") scanner.add_rules("/rules/exploits.yar") # Add rule directory scanner.add_rule_directory("/rules/") # Scan single file matches = scanner.scan_file("/samples/malware.exe") for match in matches: print(f"Rule: {match.rule}") print(f" Namespace: {match.namespace}") print(f" Tags: {match.tags}") print(f" Meta: {match.meta}") print(f" Strings matched:") for s in match.strings: print(f" {s.identifier}: {s.data} @ 0x{s.offset:x}") # Scan directory dir_matches = scanner.scan_directory("/samples/") for file_path, matches in dir_matches.items(): if matches: print(f"File: {file_path}") for match in matches: print(f" - {match.rule}") # Scan with specific rules specific = scanner.scan_file( "/samples/malware.exe", rules=["APT_Malware", "Ransomware"] ) # Get statistics stats = scanner.get_statistics() print(f"Files scanned: {stats.files_scanned}") print(f"Rules loaded: {stats.rules_loaded}") print(f"Matches found: {stats.total_matches}") # Export results scanner.export_results("/evidence/yara_results.json") scanner.generate_report("/evidence/yara_report.html") ``` ### Task 7: Behavior Analysis **Input**: Malware execution observations **Process**: 1. Document file operations 2. Track registry changes 3. Monitor network activity 4. Identify persistence 5. Map to MITRE ATT&CK **Output**: Behavior analysis report **Example**: ```python from malware_forensics import BehaviorAnalyzer # Initialize analyzer with sandbox report analyzer = BehaviorAnalyzer() analyzer.load_sandbox_report("/evidence/sandbox_report.json") # Or manually add observations analyzer.add_file_operation("create", "C:\\Windows\\Temp\\malware.exe") analyzer.add_registry_operation("create", "HKCU\\Software\\Microsoft\\Windows\\CurrentVersion\\Run\\Malware") analyzer.add_network_connection("tcp", "203.0.113.50", 443) analyzer.add_process_creation("cmd.exe", "/c whoami") # Get file operations file_ops = analyzer.get_file_operations() for op in file_ops: print(f"File: {op.operation} - {op.path}") print(f" Time: {op.timestamp}") # Get registry operations reg_ops = analyzer.get_registry_operations() for op in reg_ops: print(f"Registry: {op.operation} - {op.key}") print(f" Value: {op.value}") # Get network activity network = analyzer.get_network_activity() for conn in network: print(f"Network: {conn.protocol} {conn.destination}:{conn.port}") print(f" DNS: {conn.dns_query}") # Get process activity processes = analyzer.get_process_activity() for proc in processes: print(f"Process: {proc.name}") print(f" Command: {proc.command_line}") print(f" Parent: {proc.parent}") # Map to MITRE ATT&CK mitre = analyzer.map_to_mitre() for technique in mitre: print(f"Technique: {technique.id} - {technique.name}") print(f" Tactic: {technique.tactic}") print(f" Evidence: {technique.evidence}") # Identify capabilities capabilities = analyzer.identify_capabilities() for cap in capabilities: print(f"Capability: {cap.name}") print(f" Description: {cap.description}") print(f" Confidence: {cap.confidence}") # Generate behavior report analyzer.generate_report("/evidence/behavior_analysis.html") ``` ### Task 8: Sample Comparison **Input**: Multiple malware samples **Process**: 1. Calculate similarity hashes 2. Compare code sections 3. Identify shared IOCs 4. Find common patterns 5. Cluster related samples **Output**: Sample comparison results **Example**: ```python from malware_forensics import SampleComparator # Initialize comparator comparator = SampleComparator() # Add samples comparator.add_sample("/samples/sample1.exe") comparator.add_sample("/samples/sample2.exe") comparator.add_sample("/samples/sample3.exe") # Or add directory comparator.add_directory("/samples/") # Compare all samples comparison = comparator.compare_all() # Get similarity matrix matrix = comparison.similarity_matrix for sample1, similarities in matrix.items(): for sample2, score in similarities.items(): if score > 0.8: print(f"{sample1} <-> {sample2}: {score:.2f}") # Find clusters clusters = comparator.cluster_samples(threshold=0.7) for i, cluster in enumerate(clusters): print(f"Cluster {i + 1}:") for sample in cluster: print(f" - {sample}") # Compare specific samples detail = comparator.compare_pair("/samples/a.exe", "/samples/b.exe") print(f"Overall similarity: {detail.overall_score}") print(f"Section similarity: {detail.section_scores}") print(f"Import similarity: {detail.import_score}") print(f"String similarity: {detail.string_score}") print(f"Shared IOCs: {detail.shared_iocs}") # Find shared code shared_code = comparator.find_shared_code() for code in shared_code: print(f"Shared code block at offset {code.offset}") print(f" Size: {code.size}") print(f" Samples: {code.samples}") # Generate comparison report comparator.generate_report("/evidence/comparison_report.html") ``` ### Task 9: Attribution Analysis **Input**: Malware sample with IOCs **Process**: 1. Match against known families 2. Compare with threat intel 3. Identify TTPs 4. Find related campaigns 5. Document attribution **Output**: Attribution analysis **Example**: ```python from malware_forensics import AttributionAnalyzer # Initialize analyzer analyzer = AttributionAnalyzer("/samples/malware.exe") # Match against malware families families = analyzer.match_malware_families() for family in families: print(f"Family: {family.name}") print(f" Confidence: {family.confidence}") print(f" Matching indicators: {family.indicators}") # Check against threat intel threat_intel = analyzer.check_threat_intel( feeds=["malwarebazaar", "virustotal", "threatfox"] ) for intel in threat_intel: print(f"Intel: {intel.source}") print(f" Family: {intel.family}") print(f" Tags: {intel.tags}") print(f" First seen: {intel.first_seen}") # Match TTPs to threat actors actors = analyzer.match_threat_actors() for actor in actors: print(f"Threat Actor: {actor.name}") print(f" Confidence: {actor.confidence}") print(f" Matching TTPs: {actor.matching_ttps}") print(f" Known aliases: {actor.aliases}") # Find related campaigns campaigns = analyzer.find_related_campaigns() for campaign in campaigns: print(f"Campaign: {campaign.name}") print(f" Time range: {campaign.start_date} - {campaign.end_date}") print(f" Targets: {campaign.targets}") # Get attribution summary summary = analyzer.get_attribution_summary() print(f"Most likely family: {summary.primary_family}") print(f"Most likely actor: {summary.primary_actor}") print(f"Confidence: {summary.overall_confidence}") # Generate attribution report analyzer.generate_report("/evidence/attribution.html") ``` ### Task 10: Malware Triage **Input**: Collection of suspicious files **Process**: 1. Calculate hashes 2. Check against known malware 3. Quick static analysis 4. Prioritize for analysis 5. Generate triage report **Output**: Triage results with priorities **Example**: ```python from malware_forensics import MalwareTriage # Initialize triage triage = MalwareTriage() # Add samples triage.add_directory("/quarantine/") # Run triage results = triage.run() print(f"Total samples: {results.total}") print(f"Known malware: {results.known_malware}") print(f"Suspicious: {results.suspicious}") print(f"Clean: {results.clean}") print(f"Unknown: {results.unknown}") # Get prioritized list prioritized = triage.get_prioritized() for sample in prioritized: print(f"Priority {sample.priority}: {sample.filename}") print(f" Status: {sample.status}") print(f" Reason: {sample.reason}") print(f" Risk score: {sample.risk_score}") # Get known malware known = triage.get_known_malware() for m in known: print(f"Known: {m.filename}") print(f" Family: {m.family}") print(f" Detection: {m.detection_name}") # Get suspicious files suspicious = triage.get_suspicious() for s in suspicious: print(f"Suspicious: {s.filename}") print(f" Indicators: {s.indicators}") # Export triage results triage.export_csv("/evidence/triage_results.csv") triage.generate_report("/evidence/triage_report.html") ``` ## Configuration ### Environment Variables | Variable | Description | Required | Default | |----------|-------------|----------|---------| | `YARA_RULES_PATH` | Default YARA rules directory | No | ./rules | | `VT_API_KEY` | VirusTotal API key | No | None | | `MALWARE_BAZAAR_KEY` | MalwareBazaar API key | No | None | | `SANDBOX_API` | Sandbox service API URL | No | None | ### Options | Option | Type | Description | |--------|------|-------------| | `auto_deobfuscate` | boolean | Auto-deobfuscate scripts | | `extract_resources` | boolean | Extract PE resources | | `deep_string_analysis` | boolean | Extended string analysis | | `check_threat_intel` | boolean | Check against threat intel | | `parallel` | boolean | Enable parallel processing | ## Examples ### Example 1: Incident Response Analysis **Scenario**: Analyzing malware from compromised system ```python from malware_forensics import MalwareAnalyzer, IOCExtractor # Analyze malware sample analyzer = MalwareAnalyzer("/evidence/malware.exe") analysis = analyzer.full_analysis() # Extract IOCs for blocking extractor = IOCExtractor("/evidence/malware.exe") iocs = extractor.extract_all() # Export for SIEM extractor.export_stix("/evidence/block_iocs.stix") # Generate IR report analyzer.generate_ir_report("/evidence/malware_ir_report.html") ``` ### Example 2: Threat Intelligence **Scenario**: Analyzing new malware for threat intel ```python from malware_forensics import MalwareAnalyzer, AttributionAnalyzer # Full analysis analyzer = MalwareAnalyzer("/samples/new_sample.exe") analysis = analyzer.full_analysis() # Attribution attribution = AttributionAnalyzer("/samples/new_sample.exe") actor = attribution.match_threat_actors() campaigns = attribution.find_related_campaigns() # Generate threat intel report analyzer.generate_threat_intel_report("/evidence/threat_intel.html") ``` ## Limitations - Static analysis cannot detect runtime behavior - Packed samples may require manual unpacking - Obfuscated code may resist analysis - Attribution has inherent uncertainty - Requires safe environment for handling samples - Some formats may have limited support - Threat intel depends on available data ## Troubleshooting ### Common Issue 1: PE Parsing Failure **Problem**: Cannot parse PE file **Solution**: - Check file integrity - May be packed or corrupted - Try different parsing options ### Common Issue 2: Deobfuscation Failure **Problem**: Script remains obfuscated **Solution**: - Try manual deobfuscation - Use dynamic analysis - Check for custom obfuscation ### Common Issue 3: YARA Rule Errors **Problem**: YARA rules fail to compile **Solution**: - Check rule syntax - Verify string escape sequences - Update YARA version ## Related Skills - [memory-forensics](../memory-forensics/): Memory-based malware analysis - [disk-forensics](../disk-forensics/): Find malware artifacts on disk - [network-forensics](../network-forensics/): Analyze malware traffic - [timeline-forensics](../timeline-forensics/): Malware timeline integration - [artifact-collection](../artifact-collection/): Sample collection procedures ## References - [Malware Analysis Reference](references/REFERENCE.md) - [YARA Rule Writing Guide](references/YARA_GUIDE.md) - [PE Format Specification](references/PE_FORMAT.md)