---
name: malware-forensics
description: |
  Analyze malware samples for forensic investigation. Use when investigating malware
  infections, determining malware capabilities, extracting IOCs, or understanding
  attack techniques. Supports static and dynamic analysis of executables, scripts,
  and documents.
license: Apache-2.0
compatibility: |
  - Python 3.9+
  - Optional: yara-python, pefile, oletools, ssdeep
metadata:
  author: SherifEldeeb
  version: "1.0.0"
  category: forensics
---

# Malware Forensics

Comprehensive malware forensics skill for analyzing malicious software samples. Enables static and dynamic analysis, extraction of indicators of compromise, attribution research, and documentation of malware capabilities for incident response and threat intelligence.

## Capabilities

- **Static Analysis**: Analyze malware without execution (strings, headers, imports)
- **PE Analysis**: Parse Windows executables, DLLs, and drivers
- **Document Analysis**: Analyze malicious Office documents and PDFs
- **Script Analysis**: Analyze malicious scripts (PowerShell, VBA, JavaScript)
- **IOC Extraction**: Extract IPs, domains, URLs, hashes, and other indicators
- **YARA Scanning**: Scan samples with YARA rules for identification
- **String Analysis**: Extract and categorize strings from samples
- **Behavior Analysis**: Document observed malware behavior
- **Unpacking Support**: Identify and document packed samples
- **Attribution Analysis**: Link samples to threat actors or campaigns

## Quick Start

```python
from malware_forensics import MalwareAnalyzer, PEAnalyzer, IOCExtractor

# Analyze sample
analyzer = MalwareAnalyzer("/samples/malware.exe")
report = analyzer.analyze()

# Extract IOCs
extractor = IOCExtractor("/samples/malware.exe")
iocs = extractor.extract_all()

# Scan with YARA
matches = analyzer.yara_scan("/rules/malware.yar")
```

## Usage

### Task 1: PE File Analysis
**Input**: Windows executable (EXE, DLL, SYS)

**Process**:
1. Parse PE headers
2. Analyze imports/exports
3. Check for anomalies
4. Extract resources
5. Calculate hashes

**Output**: PE analysis report

**Example**:
```python
from malware_forensics import PEAnalyzer

# Initialize PE analyzer
analyzer = PEAnalyzer("/samples/suspicious.exe")

# Get basic info
info = analyzer.get_basic_info()
print(f"File: {info.filename}")
print(f"Size: {info.size}")
print(f"MD5: {info.md5}")
print(f"SHA256: {info.sha256}")
print(f"SSDeep: {info.ssdeep}")
print(f"Type: {info.file_type}")

# Get PE headers
headers = analyzer.get_headers()
print(f"Machine: {headers.machine}")
print(f"Timestamp: {headers.timestamp}")
print(f"Subsystem: {headers.subsystem}")
print(f"Entry point: 0x{headers.entry_point:x}")
print(f"Image base: 0x{headers.image_base:x}")

# Get sections
sections = analyzer.get_sections()
for section in sections:
    print(f"Section: {section.name}")
    print(f"  Virtual size: {section.virtual_size}")
    print(f"  Raw size: {section.raw_size}")
    print(f"  Entropy: {section.entropy}")
    print(f"  MD5: {section.md5}")

# Get imports
imports = analyzer.get_imports()
for dll, functions in imports.items():
    print(f"Import: {dll}")
    for func in functions:
        print(f"  - {func}")

# Get exports
exports = analyzer.get_exports()
for export in exports:
    print(f"Export: {export.name} @ {export.ordinal}")

# Detect anomalies
anomalies = analyzer.detect_anomalies()
for a in anomalies:
    print(f"ANOMALY: {a.type}")
    print(f"  Description: {a.description}")
    print(f"  Severity: {a.severity}")

# Get resources
resources = analyzer.get_resources()
for resource in resources:
    print(f"Resource: {resource.type}/{resource.name}")
    print(f"  Size: {resource.size}")
    print(f"  Language: {resource.language}")

# Check for packing
packing = analyzer.detect_packing()
print(f"Packed: {packing.is_packed}")
print(f"Packer: {packing.packer_name}")
print(f"Confidence: {packing.confidence}")

# Generate report
analyzer.generate_report("/evidence/pe_analysis.html")
```

### Task 2: String Analysis
**Input**: Malware sample

**Process**:
1. Extract ASCII/Unicode strings
2. Categorize by type
3. Identify suspicious strings
4. Extract encoded strings
5. Document findings

**Output**: String analysis with categorization

**Example**:
```python
from malware_forensics import StringAnalyzer

# Initialize string analyzer
analyzer = StringAnalyzer("/samples/malware.exe")

# Extract all strings
strings = analyzer.extract_all(min_length=4)
print(f"Total strings: {len(strings)}")

# Get strings by category
categorized = analyzer.categorize()

print(f"URLs: {len(categorized.urls)}")
for url in categorized.urls:
    print(f"  {url}")

print(f"IPs: {len(categorized.ips)}")
for ip in categorized.ips:
    print(f"  {ip}")

print(f"Domains: {len(categorized.domains)}")
for domain in categorized.domains:
    print(f"  {domain}")

print(f"Registry keys: {len(categorized.registry)}")
for reg in categorized.registry:
    print(f"  {reg}")

print(f"File paths: {len(categorized.file_paths)}")
for path in categorized.file_paths:
    print(f"  {path}")

# Find suspicious strings
suspicious = analyzer.find_suspicious()
for s in suspicious:
    print(f"SUSPICIOUS: {s.value}")
    print(f"  Category: {s.category}")
    print(f"  Reason: {s.reason}")

# Decode encoded strings
decoded = analyzer.decode_strings()
for d in decoded:
    print(f"Encoded: {d.encoded[:50]}...")
    print(f"  Encoding: {d.encoding}")
    print(f"  Decoded: {d.decoded}")

# Find strings with XOR patterns
xor_strings = analyzer.find_xor_encoded()
for x in xor_strings:
    print(f"XOR key 0x{x.key:02x}: {x.decoded}")

# Export strings
analyzer.export("/evidence/strings.txt")
analyzer.export_json("/evidence/strings.json")
```

### Task 3: Document Analysis
**Input**: Malicious document (Office, PDF)

**Process**:
1. Parse document structure
2. Extract macros/scripts
3. Analyze embedded objects
4. Detect exploits
5. Extract payloads

**Output**: Document analysis report

**Example**:
```python
from malware_forensics import DocumentAnalyzer

# Analyze Office document
analyzer = DocumentAnalyzer("/samples/malicious.docx")

# Get document info
info = analyzer.get_info()
print(f"Format: {info.format}")
print(f"Created: {info.created}")
print(f"Modified: {info.modified}")
print(f"Author: {info.author}")
print(f"Has macros: {info.has_macros}")
print(f"Has embedded: {info.has_embedded}")

# Extract macros
macros = analyzer.extract_macros()
for macro in macros:
    print(f"Macro: {macro.name}")
    print(f"  Type: {macro.type}")
    print(f"  Code preview: {macro.code[:200]}...")
    print(f"  Suspicious: {macro.is_suspicious}")

# Analyze VBA code
vba_analysis = analyzer.analyze_vba()
for finding in vba_analysis.findings:
    print(f"VBA Finding: {finding.type}")
    print(f"  Description: {finding.description}")
    print(f"  Code: {finding.code_snippet}")

# Get auto-execute triggers
triggers = analyzer.get_auto_triggers()
for trigger in triggers:
    print(f"Trigger: {trigger.name}")
    print(f"  Type: {trigger.trigger_type}")

# Extract embedded objects
embedded = analyzer.extract_embedded("/evidence/embedded/")
for obj in embedded:
    print(f"Embedded: {obj.filename}")
    print(f"  Type: {obj.content_type}")
    print(f"  SHA256: {obj.sha256}")

# Detect exploits
exploits = analyzer.detect_exploits()
for exploit in exploits:
    print(f"EXPLOIT: {exploit.cve}")
    print(f"  Description: {exploit.description}")
    print(f"  Confidence: {exploit.confidence}")

# PDF-specific analysis
if info.format == "PDF":
    pdf_analysis = analyzer.analyze_pdf_structure()
    print(f"JavaScript: {pdf_analysis.has_javascript}")
    print(f"OpenAction: {pdf_analysis.has_openaction}")
    print(f"Embedded files: {pdf_analysis.embedded_files}")

# Generate report
analyzer.generate_report("/evidence/document_analysis.html")
```

### Task 4: Script Analysis
**Input**: Malicious script file

**Process**:
1. Identify script type
2. Deobfuscate code
3. Analyze behavior
4. Extract IOCs
5. Document capabilities

**Output**: Script analysis report

**Example**:
```python
from malware_forensics import ScriptAnalyzer

# Analyze script
analyzer = ScriptAnalyzer("/samples/malicious.ps1")

# Get script info
info = analyzer.get_info()
print(f"Type: {info.script_type}")
print(f"Size: {info.size}")
print(f"Encoding: {info.encoding}")
print(f"Obfuscated: {info.is_obfuscated}")

# Deobfuscate script
deobfuscated = analyzer.deobfuscate()
print(f"Deobfuscation stages: {deobfuscated.stages}")
print(f"Final code preview: {deobfuscated.final_code[:500]}...")

# Analyze PowerShell-specific features
if info.script_type == "PowerShell":
    ps_analysis = analyzer.analyze_powershell()
    print(f"Download cradles: {ps_analysis.download_cradles}")
    print(f"Encoded commands: {ps_analysis.encoded_commands}")
    print(f"Bypass techniques: {ps_analysis.bypass_techniques}")

# Get suspicious patterns
patterns = analyzer.find_suspicious_patterns()
for p in patterns:
    print(f"Pattern: {p.name}")
    print(f"  Description: {p.description}")
    print(f"  Code: {p.code_snippet}")
    print(f"  MITRE: {p.mitre_technique}")

# Analyze JavaScript
if info.script_type == "JavaScript":
    js_analysis = analyzer.analyze_javascript()
    print(f"Eval calls: {js_analysis.eval_calls}")
    print(f"Document.write: {js_analysis.document_writes}")
    print(f"External requests: {js_analysis.external_requests}")

# Extract IOCs
iocs = analyzer.extract_iocs()
print(f"URLs: {iocs.urls}")
print(f"Domains: {iocs.domains}")
print(f"IPs: {iocs.ips}")

# Get execution flow
flow = analyzer.analyze_execution_flow()
for step in flow:
    print(f"Step {step.order}: {step.description}")
    print(f"  Action: {step.action}")

# Generate report
analyzer.generate_report("/evidence/script_analysis.html")
```

### Task 5: IOC Extraction
**Input**: Malware sample

**Process**:
1. Extract network IOCs
2. Extract file IOCs
3. Extract registry IOCs
4. Deduplicate and validate
5. Export in multiple formats

**Output**: IOC collection

**Example**:
```python
from malware_forensics import IOCExtractor

# Initialize extractor
extractor = IOCExtractor("/samples/malware.exe")

# Extract all IOCs
iocs = extractor.extract_all()

# Network IOCs
print(f"URLs ({len(iocs.urls)}):")
for url in iocs.urls:
    print(f"  {url.value}")
    print(f"    Context: {url.context}")
    print(f"    Confidence: {url.confidence}")

print(f"Domains ({len(iocs.domains)}):")
for domain in iocs.domains:
    print(f"  {domain.value}")

print(f"IPs ({len(iocs.ips)}):")
for ip in iocs.ips:
    print(f"  {ip.value}")
    print(f"    Type: {ip.ip_type}")  # C2, download, etc.

# File IOCs
print(f"File paths ({len(iocs.file_paths)}):")
for path in iocs.file_paths:
    print(f"  {path.value}")

print(f"File hashes ({len(iocs.hashes)}):")
for h in iocs.hashes:
    print(f"  {h.algorithm}: {h.value}")

# Registry IOCs
print(f"Registry keys ({len(iocs.registry_keys)}):")
for reg in iocs.registry_keys:
    print(f"  {reg.value}")

# Mutexes
print(f"Mutexes ({len(iocs.mutexes)}):")
for mutex in iocs.mutexes:
    print(f"  {mutex.value}")

# Validate IOCs
validated = extractor.validate_iocs(iocs)
print(f"Valid IOCs: {validated.valid_count}")
print(f"Invalid IOCs: {validated.invalid_count}")

# Enrich IOCs
enriched = extractor.enrich_iocs(
    iocs,
    sources=["virustotal", "threatfox", "urlhaus"]
)

# Export IOCs
extractor.export_csv("/evidence/iocs.csv")
extractor.export_json("/evidence/iocs.json")
extractor.export_stix("/evidence/iocs.stix")
extractor.export_misp("/evidence/iocs.misp.json")
```

### Task 6: YARA Scanning
**Input**: Malware sample(s) and YARA rules

**Process**:
1. Compile YARA rules
2. Scan samples
3. Collect matches
4. Document findings
5. Generate report

**Output**: YARA scan results

**Example**:
```python
from malware_forensics import YARAScanner

# Initialize scanner
scanner = YARAScanner()

# Add rule files
scanner.add_rules("/rules/malware_families.yar")
scanner.add_rules("/rules/packers.yar")
scanner.add_rules("/rules/exploits.yar")

# Add rule directory
scanner.add_rule_directory("/rules/")

# Scan single file
matches = scanner.scan_file("/samples/malware.exe")
for match in matches:
    print(f"Rule: {match.rule}")
    print(f"  Namespace: {match.namespace}")
    print(f"  Tags: {match.tags}")
    print(f"  Meta: {match.meta}")
    print(f"  Strings matched:")
    for s in match.strings:
        print(f"    {s.identifier}: {s.data} @ 0x{s.offset:x}")

# Scan directory
dir_matches = scanner.scan_directory("/samples/")
for file_path, matches in dir_matches.items():
    if matches:
        print(f"File: {file_path}")
        for match in matches:
            print(f"  - {match.rule}")

# Scan with specific rules
specific = scanner.scan_file(
    "/samples/malware.exe",
    rules=["APT_Malware", "Ransomware"]
)

# Get statistics
stats = scanner.get_statistics()
print(f"Files scanned: {stats.files_scanned}")
print(f"Rules loaded: {stats.rules_loaded}")
print(f"Matches found: {stats.total_matches}")

# Export results
scanner.export_results("/evidence/yara_results.json")
scanner.generate_report("/evidence/yara_report.html")
```

### Task 7: Behavior Analysis
**Input**: Malware execution observations

**Process**:
1. Document file operations
2. Track registry changes
3. Monitor network activity
4. Identify persistence
5. Map to MITRE ATT&CK

**Output**: Behavior analysis report

**Example**:
```python
from malware_forensics import BehaviorAnalyzer

# Initialize analyzer with sandbox report
analyzer = BehaviorAnalyzer()
analyzer.load_sandbox_report("/evidence/sandbox_report.json")

# Or manually add observations
analyzer.add_file_operation("create", "C:\\Windows\\Temp\\malware.exe")
analyzer.add_registry_operation("create", "HKCU\\Software\\Microsoft\\Windows\\CurrentVersion\\Run\\Malware")
analyzer.add_network_connection("tcp", "203.0.113.50", 443)
analyzer.add_process_creation("cmd.exe", "/c whoami")

# Get file operations
file_ops = analyzer.get_file_operations()
for op in file_ops:
    print(f"File: {op.operation} - {op.path}")
    print(f"  Time: {op.timestamp}")

# Get registry operations
reg_ops = analyzer.get_registry_operations()
for op in reg_ops:
    print(f"Registry: {op.operation} - {op.key}")
    print(f"  Value: {op.value}")

# Get network activity
network = analyzer.get_network_activity()
for conn in network:
    print(f"Network: {conn.protocol} {conn.destination}:{conn.port}")
    print(f"  DNS: {conn.dns_query}")

# Get process activity
processes = analyzer.get_process_activity()
for proc in processes:
    print(f"Process: {proc.name}")
    print(f"  Command: {proc.command_line}")
    print(f"  Parent: {proc.parent}")

# Map to MITRE ATT&CK
mitre = analyzer.map_to_mitre()
for technique in mitre:
    print(f"Technique: {technique.id} - {technique.name}")
    print(f"  Tactic: {technique.tactic}")
    print(f"  Evidence: {technique.evidence}")

# Identify capabilities
capabilities = analyzer.identify_capabilities()
for cap in capabilities:
    print(f"Capability: {cap.name}")
    print(f"  Description: {cap.description}")
    print(f"  Confidence: {cap.confidence}")

# Generate behavior report
analyzer.generate_report("/evidence/behavior_analysis.html")
```

### Task 8: Sample Comparison
**Input**: Multiple malware samples

**Process**:
1. Calculate similarity hashes
2. Compare code sections
3. Identify shared IOCs
4. Find common patterns
5. Cluster related samples

**Output**: Sample comparison results

**Example**:
```python
from malware_forensics import SampleComparator

# Initialize comparator
comparator = SampleComparator()

# Add samples
comparator.add_sample("/samples/sample1.exe")
comparator.add_sample("/samples/sample2.exe")
comparator.add_sample("/samples/sample3.exe")

# Or add directory
comparator.add_directory("/samples/")

# Compare all samples
comparison = comparator.compare_all()

# Get similarity matrix
matrix = comparison.similarity_matrix
for sample1, similarities in matrix.items():
    for sample2, score in similarities.items():
        if score > 0.8:
            print(f"{sample1} <-> {sample2}: {score:.2f}")

# Find clusters
clusters = comparator.cluster_samples(threshold=0.7)
for i, cluster in enumerate(clusters):
    print(f"Cluster {i + 1}:")
    for sample in cluster:
        print(f"  - {sample}")

# Compare specific samples
detail = comparator.compare_pair("/samples/a.exe", "/samples/b.exe")
print(f"Overall similarity: {detail.overall_score}")
print(f"Section similarity: {detail.section_scores}")
print(f"Import similarity: {detail.import_score}")
print(f"String similarity: {detail.string_score}")
print(f"Shared IOCs: {detail.shared_iocs}")

# Find shared code
shared_code = comparator.find_shared_code()
for code in shared_code:
    print(f"Shared code block at offset {code.offset}")
    print(f"  Size: {code.size}")
    print(f"  Samples: {code.samples}")

# Generate comparison report
comparator.generate_report("/evidence/comparison_report.html")
```

### Task 9: Attribution Analysis
**Input**: Malware sample with IOCs

**Process**:
1. Match against known families
2. Compare with threat intel
3. Identify TTPs
4. Find related campaigns
5. Document attribution

**Output**: Attribution analysis

**Example**:
```python
from malware_forensics import AttributionAnalyzer

# Initialize analyzer
analyzer = AttributionAnalyzer("/samples/malware.exe")

# Match against malware families
families = analyzer.match_malware_families()
for family in families:
    print(f"Family: {family.name}")
    print(f"  Confidence: {family.confidence}")
    print(f"  Matching indicators: {family.indicators}")

# Check against threat intel
threat_intel = analyzer.check_threat_intel(
    feeds=["malwarebazaar", "virustotal", "threatfox"]
)
for intel in threat_intel:
    print(f"Intel: {intel.source}")
    print(f"  Family: {intel.family}")
    print(f"  Tags: {intel.tags}")
    print(f"  First seen: {intel.first_seen}")

# Match TTPs to threat actors
actors = analyzer.match_threat_actors()
for actor in actors:
    print(f"Threat Actor: {actor.name}")
    print(f"  Confidence: {actor.confidence}")
    print(f"  Matching TTPs: {actor.matching_ttps}")
    print(f"  Known aliases: {actor.aliases}")

# Find related campaigns
campaigns = analyzer.find_related_campaigns()
for campaign in campaigns:
    print(f"Campaign: {campaign.name}")
    print(f"  Time range: {campaign.start_date} - {campaign.end_date}")
    print(f"  Targets: {campaign.targets}")

# Get attribution summary
summary = analyzer.get_attribution_summary()
print(f"Most likely family: {summary.primary_family}")
print(f"Most likely actor: {summary.primary_actor}")
print(f"Confidence: {summary.overall_confidence}")

# Generate attribution report
analyzer.generate_report("/evidence/attribution.html")
```

### Task 10: Malware Triage
**Input**: Collection of suspicious files

**Process**:
1. Calculate hashes
2. Check against known malware
3. Quick static analysis
4. Prioritize for analysis
5. Generate triage report

**Output**: Triage results with priorities

**Example**:
```python
from malware_forensics import MalwareTriage

# Initialize triage
triage = MalwareTriage()

# Add samples
triage.add_directory("/quarantine/")

# Run triage
results = triage.run()

print(f"Total samples: {results.total}")
print(f"Known malware: {results.known_malware}")
print(f"Suspicious: {results.suspicious}")
print(f"Clean: {results.clean}")
print(f"Unknown: {results.unknown}")

# Get prioritized list
prioritized = triage.get_prioritized()
for sample in prioritized:
    print(f"Priority {sample.priority}: {sample.filename}")
    print(f"  Status: {sample.status}")
    print(f"  Reason: {sample.reason}")
    print(f"  Risk score: {sample.risk_score}")

# Get known malware
known = triage.get_known_malware()
for m in known:
    print(f"Known: {m.filename}")
    print(f"  Family: {m.family}")
    print(f"  Detection: {m.detection_name}")

# Get suspicious files
suspicious = triage.get_suspicious()
for s in suspicious:
    print(f"Suspicious: {s.filename}")
    print(f"  Indicators: {s.indicators}")

# Export triage results
triage.export_csv("/evidence/triage_results.csv")
triage.generate_report("/evidence/triage_report.html")
```

## Configuration

### Environment Variables
| Variable | Description | Required | Default |
|----------|-------------|----------|---------|
| `YARA_RULES_PATH` | Default YARA rules directory | No | ./rules |
| `VT_API_KEY` | VirusTotal API key | No | None |
| `MALWARE_BAZAAR_KEY` | MalwareBazaar API key | No | None |
| `SANDBOX_API` | Sandbox service API URL | No | None |

### Options
| Option | Type | Description |
|--------|------|-------------|
| `auto_deobfuscate` | boolean | Auto-deobfuscate scripts |
| `extract_resources` | boolean | Extract PE resources |
| `deep_string_analysis` | boolean | Extended string analysis |
| `check_threat_intel` | boolean | Check against threat intel |
| `parallel` | boolean | Enable parallel processing |

## Examples

### Example 1: Incident Response Analysis
**Scenario**: Analyzing malware from compromised system

```python
from malware_forensics import MalwareAnalyzer, IOCExtractor

# Analyze malware sample
analyzer = MalwareAnalyzer("/evidence/malware.exe")
analysis = analyzer.full_analysis()

# Extract IOCs for blocking
extractor = IOCExtractor("/evidence/malware.exe")
iocs = extractor.extract_all()

# Export for SIEM
extractor.export_stix("/evidence/block_iocs.stix")

# Generate IR report
analyzer.generate_ir_report("/evidence/malware_ir_report.html")
```

### Example 2: Threat Intelligence
**Scenario**: Analyzing new malware for threat intel

```python
from malware_forensics import MalwareAnalyzer, AttributionAnalyzer

# Full analysis
analyzer = MalwareAnalyzer("/samples/new_sample.exe")
analysis = analyzer.full_analysis()

# Attribution
attribution = AttributionAnalyzer("/samples/new_sample.exe")
actor = attribution.match_threat_actors()
campaigns = attribution.find_related_campaigns()

# Generate threat intel report
analyzer.generate_threat_intel_report("/evidence/threat_intel.html")
```

## Limitations

- Static analysis cannot detect runtime behavior
- Packed samples may require manual unpacking
- Obfuscated code may resist analysis
- Attribution has inherent uncertainty
- Requires safe environment for handling samples
- Some formats may have limited support
- Threat intel depends on available data

## Troubleshooting

### Common Issue 1: PE Parsing Failure
**Problem**: Cannot parse PE file
**Solution**:
- Check file integrity
- May be packed or corrupted
- Try different parsing options

### Common Issue 2: Deobfuscation Failure
**Problem**: Script remains obfuscated
**Solution**:
- Try manual deobfuscation
- Use dynamic analysis
- Check for custom obfuscation

### Common Issue 3: YARA Rule Errors
**Problem**: YARA rules fail to compile
**Solution**:
- Check rule syntax
- Verify string escape sequences
- Update YARA version

## Related Skills

- [memory-forensics](../memory-forensics/): Memory-based malware analysis
- [disk-forensics](../disk-forensics/): Find malware artifacts on disk
- [network-forensics](../network-forensics/): Analyze malware traffic
- [timeline-forensics](../timeline-forensics/): Malware timeline integration
- [artifact-collection](../artifact-collection/): Sample collection procedures

## References

- [Malware Analysis Reference](references/REFERENCE.md)
- [YARA Rule Writing Guide](references/YARA_GUIDE.md)
- [PE Format Specification](references/PE_FORMAT.md)