---
name: artifact-collection
description: |
  Collect and preserve digital forensic artifacts from systems and devices. Use when
  responding to incidents, collecting evidence for investigation, or preserving
  volatile data. Supports Windows, Linux, macOS artifact collection with chain of custody.
license: Apache-2.0
compatibility: |
  - Python 3.9+
  - Optional: volatility3, psutil, wmi
metadata:
  author: SherifEldeeb
  version: "1.0.0"
  category: forensics
---

# Artifact Collection

Comprehensive artifact collection skill for gathering and preserving digital forensic evidence. Enables systematic collection of volatile and non-volatile artifacts from endpoints, maintaining chain of custody, and ensuring forensic integrity throughout the collection process.

## Capabilities

- **Volatile Data Collection**: Capture RAM, running processes, network connections
- **Disk Artifact Collection**: Collect registry, event logs, browser data
- **Log Collection**: Gather system, application, and security logs
- **Configuration Collection**: Capture system configuration and state
- **Evidence Packaging**: Package artifacts with integrity verification
- **Chain of Custody**: Document and maintain evidence chain of custody
- **Remote Collection**: Collect artifacts from remote systems
- **Triage Collection**: Quick artifact collection for rapid response
- **Selective Collection**: Target specific artifact types
- **Collection Verification**: Verify collected artifact integrity

## Quick Start

```python
from artifact_collection import ArtifactCollector, WindowsCollector, ChainOfCustody

# Initialize collector
collector = WindowsCollector(output_dir="/evidence/case001/")

# Collect volatile artifacts
collector.collect_volatile()

# Collect disk artifacts
collector.collect_disk_artifacts()

# Generate chain of custody
coc = ChainOfCustody(collector)
coc.generate_report("/evidence/case001/chain_of_custody.pdf")
```

## Usage

### Task 1: Volatile Data Collection
**Input**: Target system (local or remote)

**Process**:
1. Document system state
2. Capture memory dump
3. Collect running processes
4. Capture network connections
5. Preserve volatile artifacts

**Output**: Volatile artifacts with documentation

**Example**:
```python
from artifact_collection import VolatileCollector

# Initialize collector
collector = VolatileCollector(
    output_dir="/evidence/case001/volatile/",
    case_id="CASE-2024-001",
    examiner="John Doe"
)

# Collect memory dump
memory = collector.collect_memory()
print(f"Memory dump: {memory.path}")
print(f"Size: {memory.size_gb}GB")
print(f"Hash: {memory.sha256}")
print(f"Tool: {memory.acquisition_tool}")

# Collect running processes
processes = collector.collect_processes()
for proc in processes:
    print(f"PID {proc.pid}: {proc.name}")
    print(f"  Path: {proc.exe_path}")
    print(f"  User: {proc.username}")
    print(f"  Command: {proc.command_line}")
    print(f"  Start: {proc.start_time}")

# Collect network connections
connections = collector.collect_network_connections()
for conn in connections:
    print(f"{conn.local_addr}:{conn.local_port} -> "
          f"{conn.remote_addr}:{conn.remote_port}")
    print(f"  PID: {conn.pid}")
    print(f"  State: {conn.state}")
    print(f"  Protocol: {conn.protocol}")

# Collect network interfaces
interfaces = collector.collect_network_interfaces()
for iface in interfaces:
    print(f"Interface: {iface.name}")
    print(f"  IP: {iface.ip_address}")
    print(f"  MAC: {iface.mac_address}")

# Collect DNS cache
dns_cache = collector.collect_dns_cache()

# Collect ARP cache
arp_cache = collector.collect_arp_cache()

# Collect clipboard
clipboard = collector.collect_clipboard()

# Collect environment variables
env_vars = collector.collect_environment_variables()

# Generate collection report
collector.generate_report("/evidence/case001/volatile_report.html")
```

### Task 2: Windows Artifact Collection
**Input**: Windows system

**Process**:
1. Collect registry hives
2. Collect event logs
3. Collect prefetch files
4. Collect browser artifacts
5. Package with hashes

**Output**: Windows artifacts with documentation

**Example**:
```python
from artifact_collection import WindowsCollector

# Initialize Windows collector
collector = WindowsCollector(
    output_dir="/evidence/case001/windows/",
    case_id="CASE-2024-001"
)

# Collect registry hives
registry = collector.collect_registry()
for hive in registry:
    print(f"Registry: {hive.name}")
    print(f"  Path: {hive.source_path}")
    print(f"  Hash: {hive.sha256}")

# Collect event logs
event_logs = collector.collect_event_logs()
for log in event_logs:
    print(f"Event Log: {log.name}")
    print(f"  Records: {log.record_count}")
    print(f"  Hash: {log.sha256}")

# Collect prefetch files
prefetch = collector.collect_prefetch()
print(f"Prefetch files: {len(prefetch)}")

# Collect Amcache
amcache = collector.collect_amcache()

# Collect SRUM database
srum = collector.collect_srum()

# Collect scheduled tasks
tasks = collector.collect_scheduled_tasks()

# Collect services
services = collector.collect_services()

# Collect startup items
startup = collector.collect_startup_items()

# Collect browser data
browsers = collector.collect_browser_artifacts()
for browser in browsers:
    print(f"Browser: {browser.name}")
    print(f"  History: {browser.history_count}")
    print(f"  Downloads: {browser.download_count}")

# Collect USB history
usb = collector.collect_usb_history()

# Collect recent files
recent = collector.collect_recent_files()

# Collect Jump Lists
jumplists = collector.collect_jumplists()

# Generate collection manifest
collector.generate_manifest("/evidence/case001/windows_manifest.json")
```

### Task 3: Linux Artifact Collection
**Input**: Linux system

**Process**:
1. Collect system logs
2. Collect user artifacts
3. Collect configuration files
4. Collect authentication data
5. Package artifacts

**Output**: Linux artifacts with documentation

**Example**:
```python
from artifact_collection import LinuxCollector

# Initialize Linux collector
collector = LinuxCollector(
    output_dir="/evidence/case001/linux/",
    case_id="CASE-2024-001"
)

# Collect system logs
logs = collector.collect_system_logs()
for log in logs:
    print(f"Log: {log.name}")
    print(f"  Path: {log.path}")
    print(f"  Size: {log.size}")

# Collect auth logs
auth = collector.collect_auth_logs()

# Collect user home directories
homes = collector.collect_user_homes()
for home in homes:
    print(f"User: {home.username}")
    print(f"  Bash history: {home.bash_history}")
    print(f"  SSH keys: {home.ssh_keys}")

# Collect cron jobs
cron = collector.collect_cron_jobs()
for job in cron:
    print(f"Cron: {job.user} - {job.schedule}")
    print(f"  Command: {job.command}")

# Collect systemd units
systemd = collector.collect_systemd_units()

# Collect network configuration
network = collector.collect_network_config()

# Collect installed packages
packages = collector.collect_installed_packages()

# Collect SSH configuration
ssh = collector.collect_ssh_config()

# Collect web server logs (if present)
web_logs = collector.collect_web_logs()

# Collect Docker artifacts (if present)
docker = collector.collect_docker_artifacts()

# Generate collection report
collector.generate_report("/evidence/case001/linux_report.html")
```

### Task 4: macOS Artifact Collection
**Input**: macOS system

**Process**:
1. Collect system logs
2. Collect user data
3. Collect application artifacts
4. Collect security data
5. Package artifacts

**Output**: macOS artifacts with documentation

**Example**:
```python
from artifact_collection import MacOSCollector

# Initialize macOS collector
collector = MacOSCollector(
    output_dir="/evidence/case001/macos/",
    case_id="CASE-2024-001"
)

# Collect unified logs
unified = collector.collect_unified_logs()

# Collect FSEvents
fsevents = collector.collect_fsevents()

# Collect user artifacts
users = collector.collect_user_artifacts()
for user in users:
    print(f"User: {user.username}")
    print(f"  Recent items: {len(user.recent_items)}")
    print(f"  Downloads: {len(user.downloads)}")

# Collect Spotlight data
spotlight = collector.collect_spotlight()

# Collect Keychain data (metadata only)
keychain = collector.collect_keychain_metadata()

# Collect LaunchAgents/Daemons
launch_items = collector.collect_launch_items()
for item in launch_items:
    print(f"Launch item: {item.name}")
    print(f"  Path: {item.path}")
    print(f"  Program: {item.program}")

# Collect quarantine events
quarantine = collector.collect_quarantine_events()
for q in quarantine:
    print(f"Quarantine: {q.filename}")
    print(f"  URL: {q.origin_url}")
    print(f"  Date: {q.quarantine_date}")

# Collect Safari data
safari = collector.collect_safari_artifacts()

# Collect Terminal history
terminal = collector.collect_terminal_history()

# Collect installed applications
apps = collector.collect_installed_apps()

# Generate report
collector.generate_report("/evidence/case001/macos_report.html")
```

### Task 5: Remote Artifact Collection
**Input**: Remote system credentials

**Process**:
1. Establish secure connection
2. Deploy collection agent
3. Collect artifacts remotely
4. Transfer with integrity check
5. Document collection

**Output**: Remote artifacts with verification

**Example**:
```python
from artifact_collection import RemoteCollector

# Initialize remote collector
collector = RemoteCollector(
    target="192.168.1.100",
    credentials={
        "username": "admin",
        "method": "key",
        "key_path": "/path/to/key"
    },
    output_dir="/evidence/case001/remote/"
)

# Connect to remote system
connection = collector.connect()
print(f"Connected: {connection.hostname}")
print(f"OS: {connection.os_type}")

# Collect volatile data first
volatile = collector.collect_volatile()
print(f"Memory collected: {volatile.memory_path}")
print(f"Processes: {len(volatile.processes)}")

# Collect disk artifacts
disk = collector.collect_disk_artifacts(
    artifact_types=["registry", "eventlogs", "browser"]
)

# Transfer artifacts securely
transfer = collector.transfer_artifacts()
for artifact in transfer:
    print(f"Transferred: {artifact.name}")
    print(f"  Size: {artifact.size}")
    print(f"  Local hash: {artifact.local_hash}")
    print(f"  Remote hash: {artifact.remote_hash}")
    print(f"  Verified: {artifact.verified}")

# Disconnect
collector.disconnect()

# Generate collection report
collector.generate_report("/evidence/case001/remote_report.html")
```

### Task 6: Triage Collection
**Input**: System requiring rapid assessment

**Process**:
1. Quick system inventory
2. Collect critical artifacts
3. Identify IOCs
4. Prioritize findings
5. Generate triage report

**Output**: Triage results with priorities

**Example**:
```python
from artifact_collection import TriageCollector

# Initialize triage collector
collector = TriageCollector(
    output_dir="/evidence/triage/",
    case_id="TRIAGE-001"
)

# Run quick triage
triage = collector.run_triage()

print(f"System: {triage.system_info.hostname}")
print(f"OS: {triage.system_info.os_version}")
print(f"Collection time: {triage.duration_seconds}s")

# Get alerts
for alert in triage.alerts:
    print(f"ALERT: {alert.severity} - {alert.description}")
    print(f"  Evidence: {alert.evidence}")

# Get quick IOCs
for ioc in triage.iocs:
    print(f"IOC: {ioc.type} - {ioc.value}")
    print(f"  Source: {ioc.source}")

# Get suspicious processes
for proc in triage.suspicious_processes:
    print(f"Suspicious: {proc.name} (PID {proc.pid})")
    print(f"  Reason: {proc.reason}")

# Get suspicious connections
for conn in triage.suspicious_connections:
    print(f"Connection: {conn.remote_addr}:{conn.remote_port}")
    print(f"  Process: {conn.process_name}")
    print(f"  Reason: {conn.reason}")

# Get persistence mechanisms
for persist in triage.persistence:
    print(f"Persistence: {persist.type}")
    print(f"  Path: {persist.path}")
    print(f"  Suspicious: {persist.is_suspicious}")

# Generate triage report
collector.generate_triage_report("/evidence/triage/triage_report.html")
```

### Task 7: Chain of Custody Management
**Input**: Collected artifacts

**Process**:
1. Document evidence items
2. Record handling events
3. Verify integrity
4. Generate custody log
5. Produce legal documentation

**Output**: Chain of custody documentation

**Example**:
```python
from artifact_collection import ChainOfCustody

# Initialize chain of custody
coc = ChainOfCustody(
    case_id="CASE-2024-001",
    case_name="Security Incident Investigation",
    custodian="John Doe"
)

# Add evidence items
item1 = coc.add_evidence(
    item_id="EVD-001",
    description="Memory dump from workstation",
    source_system="WORKSTATION01",
    acquisition_method="WinPMEM",
    acquisition_time="2024-01-15T10:30:00Z",
    original_location="Physical RAM",
    file_path="/evidence/case001/memory.raw",
    hash_sha256="abc123..."
)

item2 = coc.add_evidence(
    item_id="EVD-002",
    description="Windows Event Logs",
    source_system="WORKSTATION01",
    acquisition_method="Robocopy",
    acquisition_time="2024-01-15T10:45:00Z",
    original_location="C:\\Windows\\System32\\winevt\\Logs\\",
    file_path="/evidence/case001/eventlogs/",
    hash_sha256="def456..."
)

# Record custody transfer
coc.record_transfer(
    item_id="EVD-001",
    from_custodian="John Doe",
    to_custodian="Jane Smith",
    transfer_time="2024-01-15T14:00:00Z",
    reason="Transfer for analysis",
    location="Forensics Lab"
)

# Record evidence access
coc.record_access(
    item_id="EVD-001",
    accessor="Jane Smith",
    access_time="2024-01-15T14:30:00Z",
    purpose="Memory analysis",
    actions_performed="Parsed with Volatility"
)

# Verify evidence integrity
verification = coc.verify_all()
for item in verification:
    print(f"Item: {item.item_id}")
    print(f"  Current hash: {item.current_hash}")
    print(f"  Original hash: {item.original_hash}")
    print(f"  Verified: {item.verified}")

# Generate chain of custody report
coc.generate_report("/evidence/case001/chain_of_custody.pdf")

# Export custody log
coc.export_log("/evidence/case001/custody_log.json")
```

### Task 8: Evidence Packaging
**Input**: Collected artifacts

**Process**:
1. Organize artifacts
2. Calculate hashes
3. Create evidence container
4. Document contents
5. Seal package

**Output**: Sealed evidence package

**Example**:
```python
from artifact_collection import EvidencePackager

# Initialize packager
packager = EvidencePackager(
    case_id="CASE-2024-001",
    examiner="John Doe"
)

# Add artifacts to package
packager.add_directory("/evidence/case001/volatile/")
packager.add_directory("/evidence/case001/windows/")
packager.add_file("/evidence/case001/notes.txt")

# Set package metadata
packager.set_metadata(
    case_name="Security Incident",
    description="Forensic artifacts from WORKSTATION01",
    collection_start="2024-01-15T10:00:00Z",
    collection_end="2024-01-15T12:00:00Z",
    source_system="WORKSTATION01"
)

# Create evidence package
package = packager.create_package(
    output_path="/evidence/packages/CASE-2024-001.zip",
    compress=True,
    encrypt=True,
    encryption_password="secure_password"
)

print(f"Package: {package.path}")
print(f"Size: {package.size_mb}MB")
print(f"Files: {package.file_count}")
print(f"SHA256: {package.sha256}")

# Generate manifest
manifest = packager.generate_manifest()
for item in manifest.items:
    print(f"File: {item.relative_path}")
    print(f"  Size: {item.size}")
    print(f"  SHA256: {item.sha256}")

# Seal package (creates tamper-evident record)
seal = packager.seal_package()
print(f"Seal ID: {seal.seal_id}")
print(f"Seal time: {seal.timestamp}")
print(f"Seal hash: {seal.seal_hash}")
```

### Task 9: Selective Collection
**Input**: Target system and artifact specification

**Process**:
1. Parse collection specification
2. Identify target artifacts
3. Collect specified items
4. Verify collection
5. Document results

**Output**: Targeted artifact collection

**Example**:
```python
from artifact_collection import SelectiveCollector

# Initialize selective collector
collector = SelectiveCollector(
    output_dir="/evidence/selective/",
    case_id="CASE-2024-001"
)

# Define collection specification
spec = {
    "registry": ["HKLM\\SOFTWARE", "HKCU\\SOFTWARE"],
    "event_logs": ["Security", "System", "Application"],
    "directories": [
        "C:\\Users\\*\\Downloads",
        "C:\\Users\\*\\Documents"
    ],
    "files": [
        "C:\\Windows\\System32\\config\\SAM",
        "C:\\Windows\\System32\\config\\SYSTEM"
    ],
    "file_patterns": ["*.exe", "*.dll", "*.ps1"],
    "date_range": {
        "start": "2024-01-01",
        "end": "2024-01-31"
    }
}

# Collect based on specification
results = collector.collect(spec)

print(f"Items collected: {results.total_items}")
print(f"Size: {results.total_size_mb}MB")
print(f"Duration: {results.duration_seconds}s")

# Get collection details
for item in results.items:
    print(f"Collected: {item.source_path}")
    print(f"  Destination: {item.dest_path}")
    print(f"  Size: {item.size}")
    print(f"  SHA256: {item.sha256}")

# Generate selective collection report
collector.generate_report("/evidence/selective/collection_report.html")
```

### Task 10: Collection Verification
**Input**: Evidence collection directory

**Process**:
1. Read collection manifest
2. Verify file integrity
3. Check for missing items
4. Validate metadata
5. Generate verification report

**Output**: Verification results

**Example**:
```python
from artifact_collection import CollectionVerifier

# Initialize verifier
verifier = CollectionVerifier(
    collection_path="/evidence/case001/",
    manifest_path="/evidence/case001/manifest.json"
)

# Run full verification
verification = verifier.verify()

print(f"Verification result: {verification.status}")
print(f"Items verified: {verification.verified_count}")
print(f"Items failed: {verification.failed_count}")
print(f"Items missing: {verification.missing_count}")

# Get verification details
for item in verification.items:
    print(f"Item: {item.path}")
    print(f"  Expected hash: {item.expected_hash}")
    print(f"  Actual hash: {item.actual_hash}")
    print(f"  Status: {item.status}")
    if item.status != "verified":
        print(f"  Error: {item.error}")

# Check for integrity issues
issues = verifier.get_integrity_issues()
for issue in issues:
    print(f"ISSUE: {issue.type}")
    print(f"  Item: {issue.item}")
    print(f"  Description: {issue.description}")

# Verify chain of custody
coc_verification = verifier.verify_chain_of_custody()
print(f"Chain of custody valid: {coc_verification.valid}")

# Generate verification report
verifier.generate_report("/evidence/case001/verification_report.pdf")
```

## Configuration

### Environment Variables
| Variable | Description | Required | Default |
|----------|-------------|----------|---------|
| `EVIDENCE_OUTPUT` | Default output directory | No | ./evidence |
| `ACQUISITION_TOOL` | Memory acquisition tool | No | Auto-detect |
| `HASH_ALGORITHM` | Hash algorithm for integrity | No | SHA256 |
| `COMPRESS_ARTIFACTS` | Compress collected artifacts | No | true |

### Options
| Option | Type | Description |
|--------|------|-------------|
| `include_memory` | boolean | Include memory dump |
| `compress` | boolean | Compress artifacts |
| `encrypt` | boolean | Encrypt evidence package |
| `verify_collection` | boolean | Verify after collection |
| `parallel_collection` | boolean | Parallel artifact collection |

## Examples

### Example 1: Incident Response Collection
**Scenario**: Rapid artifact collection during active incident

```python
from artifact_collection import IncidentResponseCollector

# Initialize IR collector
collector = IncidentResponseCollector(
    case_id="IR-2024-001",
    priority="high"
)

# Quick volatile collection
volatile = collector.collect_volatile()

# Critical artifacts only
critical = collector.collect_critical_artifacts()

# Generate IR report
collector.generate_ir_report("/evidence/ir_report.html")
```

### Example 2: Legal Hold Collection
**Scenario**: Collecting artifacts for legal proceedings

```python
from artifact_collection import LegalHoldCollector

# Initialize with legal requirements
collector = LegalHoldCollector(
    case_id="LEGAL-2024-001",
    legal_hold_id="LH-12345",
    custodian="John Doe"
)

# Collect with full chain of custody
artifacts = collector.collect_all()

# Generate court-ready documentation
collector.generate_legal_package("/evidence/legal/")
```

## Limitations

- Memory acquisition requires appropriate privileges
- Some artifacts may be locked by running processes
- Remote collection depends on network connectivity
- Encrypted files cannot be decrypted without keys
- Collection may impact system performance
- Storage space required for large collections
- Some artifacts may be volatile and change

## Troubleshooting

### Common Issue 1: Access Denied
**Problem**: Cannot access certain files
**Solution**:
- Run with elevated privileges
- Use forensic boot media
- Deploy signed collection agent

### Common Issue 2: Memory Acquisition Failure
**Problem**: Cannot capture memory
**Solution**:
- Use alternative acquisition tool
- Check security software interference
- Verify driver compatibility

### Common Issue 3: Incomplete Collection
**Problem**: Some artifacts missing
**Solution**:
- Check for file locks
- Verify permissions
- Review collection logs

## Related Skills

- [memory-forensics](../memory-forensics/): Analyze collected memory
- [disk-forensics](../disk-forensics/): Analyze collected disk artifacts
- [timeline-forensics](../timeline-forensics/): Build timeline from artifacts
- [log-forensics](../log-forensics/): Analyze collected logs
- [incident-response](../../cybersecurity/incident-response/): IR workflow

## References

- [Artifact Collection Reference](references/REFERENCE.md)
- [Windows Artifacts Guide](references/WINDOWS_ARTIFACTS.md)
- [Linux Artifacts Guide](references/LINUX_ARTIFACTS.md)
- [Chain of Custody Guide](references/CHAIN_OF_CUSTODY.md)