--- name: malware-analysis-static author: Gianni Amato version: 1.3.0 homepage: https://github.com/guelfoweb/malware-analysis-static description: Use this skill for static malware analysis and reverse engineering of suspicious binaries, Android APKs, Office documents, web payloads, scripts, source-code droppers, and multi-stage chains across Linux, macOS, and Windows. It guides Codex through safe lab workflow, tool discovery with per-install authorization, triage, staged payload retrieval when authorized, decoding, evidence preservation, AGENTS.md memory, REPORT.md reporting, C2/drop URL extraction, and family attribution. --- # Malware Analysis Static Before starting, show this warning: `Warning: malware analysis should be performed only on an isolated lab machine or disposable VM, never on a personal or production workstation.` ## Startup 1. Create or reuse a case directory. 2. Create or read `AGENTS.md` and `REPORT.md`. 3. Preserve the original sample before substantive analysis. ## Scope Static analysis and reverse engineering for: - Windows PE, DLL, driver, and .NET assemblies - Linux ELF and macOS Mach-O binaries - Android APKs and embedded native libraries - Office/OLE/OOXML/RTF documents, VBA, VBS, JS, HTA, WSF, PowerShell - HTML, HTM, MHTML, SVG, XML, XSL, webshells, and source-code droppers - scripts in BAT, CMD, shell, Python, PHP, Perl, Ruby, Lua, AutoIt, Node.js, JSP, ASP, ASPX, and similar formats - extracted, embedded, decoded, dropped, downloaded, or chained stages Do not execute malware unless the user explicitly authorizes dynamic analysis in an isolated disposable lab. ## Load The Right Playbook Keep this file loaded as the core workflow. Load only the playbook needed for the current sample or stage: - Android APK: `references/apk.md` - Windows PE, DLL, driver, .NET: `references/pe-dotnet.md` - Office, VBA, VBS, JS, HTA, WSF, PowerShell, script droppers: `references/office-script.md` - HTML, web payloads, source-code droppers, webshells: `references/web-payload.md` - ELF, Mach-O, Unix shell payloads: `references/unix-binary.md` For every new embedded, decoded, extracted, dropped, or downloaded stage, repeat type identification and load the matching playbook for that stage. Do not load unrelated playbooks unless the sample actually crosses formats. ## Analyst Objectives For every sample and stage, determine when possible: - file type, architecture, language, container, packer, and protection - MD5, SHA1, SHA256 - parent-child stage relationship and execution or compromise chain order - C2 URLs, domains, IPs, Telegram/API endpoints, panels, and fallback infrastructure - drop URLs, stage URLs, update URLs, config URLs, and retrieval logic - persistence, anti-analysis, privilege, lateral movement, and exfiltration behavior - config, keys, campaign IDs, mutexes, service/task/receiver names, user agents - decoding, decryption, decompression, unpacking, request-building, and staging algorithms - likely family, cluster, loader, toolkit, or unknown with confidence and evidence Make supported claims only. State uncertainty and static-analysis limits explicitly. ## Autonomy And Reverse Engineering Discipline Do real reverse engineering, not IOC grepping. - Continue automatically through the next reasonable analytical step within authorized scope. - Do not ask whether to parse the next stream, resource, object, blob, macro, xref, class, section, overlay, function, or extracted file. - Do not ask the user to choose between equivalent next steps; attempt the best one and document the result. - If remote retrieval is not authorized, continue with offline extraction, carving, decoding, deobfuscation, xrefs, disassembly, decompilation, and decoder reconstruction. - Do not stop at the first obstacle. Try fallback tools, lower-level views, alternate extraction, and manual reconstruction before declaring a blocker. - Ask only when a missing tool must be installed, network retrieval or execution exceeds prior authorization, a real safety/cost decision exists, or an external dependency blocks progress. Valid escalation path examples: - strings -> decoded strings -> offsets -> xrefs -> disassembly -> decompilation - manifest -> jadx source -> smali -> native libraries -> config decoder - macro/script -> deobfuscation -> command reconstruction -> stage extraction - PE metadata -> imports/resources -> config carving -> FLOSS/xorsearch -> Ghidra/radare2 ## Token Budget And Anti-Loop Mode These rules are mandatory: - Keep chat minimal: status, key findings, evidence paths, next action. - Never paste raw strings, logs, decompiler output, decoded blobs, candidate lists, or large command output in chat. - Save verbose output under `case/` and inspect it with bounded commands such as `rg`, `head`, `tail`, `wc`, and `sort -u`. - Save reusable helper code under `case/04-decoding/`; do not rewrite long inline scripts repeatedly. - After two failed attempts on the same branch, document the blocker and pivot unless a new hypothesis exists. - Do not claim success unless the artifact exists, has a type, and has MD5/SHA1/SHA256 recorded. - Update `AGENTS.md` and `REPORT.md`; do not use chat as working memory. ## Tool Discovery And Installation This skill must work without bundled scripts. Discover tools dynamically and use equivalents across Linux, macOS, and Windows. Before deep analysis: 1. Detect OS and shell. 2. Identify sample type. 3. Check only the tools relevant to that type and its current stage. 4. Use available fallbacks when adequate. 5. Ask the user before each missing tool installation. Never install silently. 6. After authorization, use the native package manager or official installation path. 7. Record installed, missing, fallback, and degraded capabilities in `AGENTS.md`. Discovery examples: ```bash uname -s command -v file strings rg python3 node php perl bash sh pwsh yara objdump llvm-objdump readelf nm otool lipo r2 rabin2 jadx apktool aapt2 floss xorsearch ilspycmd monodis oleid olevba oledump.py exiftool binwalk 7z upx analyzeHeadless xmllint xxd base64 iconv python3 -c "import pefile, lief" 2>/dev/null ``` ```powershell $PSVersionTable.PSVersion Get-Command file,strings,rg,python,py,node,php,perl,pwsh,yara,objdump,llvm-objdump,dumpbin,r2,rabin2,jadx,apktool,aapt2,floss,xorsearch,ilspycmd,monodis,oleid,olevba,oledump.py,exiftool,7z,analyzeHeadless,xmllint -ErrorAction SilentlyContinue py -c "import pefile, lief" 2>$null ``` Core tools expected in most cases: `file`, hash tooling, `strings`, `rg`, `python3` or `py`, `7z`, `exiftool`, `yara`, `xxd`, `base64`. Reverse-engineering tools expected when relevant: `objdump`, `llvm-objdump`, `readelf`, `nm`, `otool`, `lipo`, `radare2`/`r2`, `rabin2`, Ghidra headless `analyzeHeadless`, `binwalk`. Format-specific tools: `jadx`, `apktool`, `aapt2`, `bundletool`, `apkanalyzer`, `floss`, `xorsearch`, `ilspycmd`, `monodis`, Python `pefile` and `lief`, `oletools` (`oleid`, `olevba`), `oledump.py`, `rtfobj`, `unzip`, `zipinfo`, `xmllint`, `iconv`, `node`, `php`, `perl`, `pwsh`, CyberChef CLI, beautifiers, AST parsers, `capa`, `upx`, `sigcheck`. Install missing tools only after explicit user authorization. Prefer native package managers: Linux `apt`, `dnf`, `pacman`; macOS `brew`; Windows `winget` or `choco`; plus `pipx`/`pip` and `dotnet tool` where appropriate. Use official sources for Ghidra, jadx, apktool, radare2, YARA, Java, .NET SDK, and Python tools. Record installations in `AGENTS.md`. ## Case Layout Create a case directory and preserve the original sample. Recommended layout: ```text case/ AGENTS.md REPORT.md 00-intake/ 01-triage/ 02-strings/ 03-static/ 04-decoding/ 05-config/ 06-extracted/ 07-stages/ 08-yara/ 09-reports/ 10-iocs/ 11-notes/ ``` Save command output to files when practical. Never overwrite the original sample or extracted evidence; create versioned names when needed. ## AGENTS.md Memory Create or read `AGENTS.md` before substantive work. Update it after every meaningful action. Required sections: - `Current Summary` - `Sample Inventory` - `Environment And Tooling` - `Timeline` - `Stage Graph` - `Confirmed Findings` - `Decoders And Algorithms` - `Family Attribution` - `Evidence Index` - `Open Questions` - `Next Actions` - `Final Assessment` Use `AGENTS.md` as persistent working memory for context, decisions, failed attempts, fallback tools, extracted stages, provenance, decoder notes, and unresolved blockers. ## REPORT.md Requirements Create `REPORT.md` during the analysis and update the same file if work continues. Keep it clear, simple, and readable. Required content: - case summary - sample inventory with MD5, SHA1, SHA256, file type, size, and role - ordered execution or compromise chain - stage table with filename/label, parent, source URL or extraction path, hashes, file type, and role - indicators separated as drop URLs, stage URLs, C2 URLs, domains, IP addresses, Telegram/API artifacts, panels, and fallback infrastructure - decoding, decryption, unpacking, request-building, and staging logic - persistence, anti-analysis, privilege, and exfiltration findings - family or cluster assessment with evidence and confidence - limitations and where the chain breaks, if incomplete `REPORT.md` is the analyst-facing report. `AGENTS.md` is the working memory. ## Multi-Stage Handling Treat every extracted, embedded, decoded, dropped, or downloaded component as a stage until proven irrelevant. For each stage: 1. Assign a stable stage ID. 2. Record parent sample, origin offset/path/URL/code path, extraction or retrieval method. 3. Save the raw artifact under `case/07-stages/` or a clearly named evidence path. 4. Compute MD5, SHA1, SHA256 immediately. 5. Re-identify type and load the matching playbook. 6. Continue analysis recursively within scope. 7. Update `AGENTS.md`, `REPORT.md`, and the stage graph. When a confirmed drop URL, stage URL, payload URL, config URL, update URL, or fallback URL is found and retrieval is already authorized in a lab context: - retrieve it without asking again - save, hash, document, and analyze it as a new stage - preserve the parent request logic: URL construction, parameters, headers, method, cookies, keys, offsets, wrappers, encodings, archives, rename rules, and post-download transforms - if the parent code decodes, decrypts, decompresses, patches, or wraps the payload before use, reproduce and document that logic before concluding When retrieval is not authorized: - do not ask whether to retrieve as the next step - continue automatically with offline parsing, carving, deobfuscation, decoder reconstruction, embedded blob extraction, and deeper reverse engineering - document remote retrieval as a next option only after the reasonable offline path is exhausted ## Required Workflow 1. Show the lab warning. 2. Create/reuse the case directory, `AGENTS.md`, and `REPORT.md`. 3. Preserve the original sample and compute MD5, SHA1, SHA256. 4. Identify file type, architecture, language, container, and likely stage role. 5. Check relevant tools and ask before installing missing blocking tools. 6. Run triage: metadata, strings, Unicode strings, archive/container inspection, YARA where useful. 7. Load the matching playbook and perform deeper static analysis. 8. Reconstruct decoders, configs, command lines, request builders, and staging logic. 9. Extract and analyze embedded or downloaded stages using the multi-stage rules. 10. Attempt family attribution using defensible evidence. 11. Update `AGENTS.md` and `REPORT.md`. 12. Final response: concise findings, hashes, family candidates, C2/drop/stage URLs, evidence paths, report paths, confidence, and limits. ## Family Attribution Try to identify the family, cluster, loader, toolkit, or campaign only when supported by evidence such as config layout, protocol, crypto constants, mutexes, service names, task names, receiver names, user-agent format, registry/filesystem conventions, builder markers, packer signatures, distinctive code, infrastructure, or stage relationships. If attribution is weak, report candidate families or `unknown`, explain why, and lower confidence. For APKs, use the Android family-pattern dataset described in `references/apk.md` as supporting evidence only, never as proof by itself. ## Final Answer Use concise structured output. Include: - sample and stage count - SHA256 of the primary sample - likely family or candidate families - key behavior and execution chain - C2, drop URLs, stage URLs, domains, and IPs with origin - decoding/decryption/staging logic - persistence and anti-analysis - extracted stages and hashes - evidence locations - `AGENTS.md` and `REPORT.md` status - confidence and limitations ## Safety Rules - Never run malware on a personal or production machine. - Never execute samples, scripts, macros, HTML smuggling chains, or webshells unless explicitly authorized in an isolated lab. - Never claim a URL is active, reachable, or malicious without evidence. - Never destroy or overwrite original evidence. - Keep decoder scripts minimal, deterministic, and operating on inert data. - Keep dynamic observations separate from static findings.