--- name: reverse-engineering-malware-with-ghidra description: 'Reverse engineers malware binaries using NSA''s Ghidra disassembler and decompiler to understand internal logic, cryptographic routines, C2 protocols, and evasion techniques at the assembly and pseudo-C level. Activates for requests involving malware reverse engineering, disassembly analysis, decompilation, binary analysis, or understanding malware internals. ' domain: cybersecurity subdomain: malware-analysis tags: - malware - reverse-engineering - Ghidra - disassembly - decompilation version: 1.0.0 author: mahipal license: Apache-2.0 nist_csf: - DE.AE-02 - RS.AN-03 - ID.RA-01 - DE.CM-01 --- # Reverse Engineering Malware with Ghidra ## When to Use - Static and dynamic analysis have identified suspicious functionality that requires deeper code-level understanding - You need to reverse engineer C2 communication protocols, encryption algorithms, or custom obfuscation - Understanding the exact exploit mechanism or vulnerability targeted by a malware sample - Extracting hardcoded configuration data (C2 addresses, encryption keys, campaign IDs) embedded in compiled code - Developing precise YARA rules or detection signatures based on unique code patterns **Do not use** for initial triage of unknown samples; perform static analysis with PEStudio and behavioral analysis with Cuckoo first. ## Prerequisites - Ghidra 11.x installed (download from https://ghidra-sre.org/) with JDK 17+ - Analysis VM isolated from production network (Windows or Linux host) - Familiarity with x86/x64 assembly language and Windows API conventions - PDB symbol files for Windows system DLLs to improve decompilation accuracy - Ghidra scripts repository (ghidra_scripts) for automated analysis tasks - Secondary reference: IDA Free or Binary Ninja for cross-validation of analysis results ## Workflow ### Step 1: Create Project and Import Binary Set up a Ghidra project and import the malware sample: ``` 1. Launch Ghidra: ghidraRun (Linux) or ghidraRun.bat (Windows) 2. File -> New Project -> Non-Shared Project -> Select directory 3. File -> Import File -> Select malware binary 4. Ghidra auto-detects format (PE, ELF, Mach-O) and architecture 5. Accept default import options (or specify base address if known) 6. Double-click imported file to open in CodeBrowser 7. When prompted, run Auto Analysis with default analyzers enabled ``` **Headless analysis for automation:** ```bash # Run Ghidra headless analysis with decompiler /opt/ghidra/support/analyzeHeadless /tmp/ghidra_project MalwareProject \ -import suspect.exe \ -postScript ExportDecompilation.py \ -scriptPath /opt/ghidra/scripts/ \ -deleteProject ``` ### Step 2: Identify Key Functions and Entry Points Navigate the binary to locate critical code sections: ``` Navigation Strategy: ━━━━━━━━━━━━━━━━━━━ 1. Start at entry point (OEP) - follow execution from _start/WinMain 2. Check Symbol Tree for imported functions (Window -> Symbol Tree) 3. Search for cross-references to suspicious APIs: - VirtualAlloc/VirtualAllocEx (memory allocation for injection) - CreateRemoteThread (remote thread injection) - CryptEncrypt/CryptDecrypt (encryption operations) - InternetOpen/HttpSendRequest (C2 communication) - RegSetValueEx (persistence via registry) 4. Use Search -> For Strings to find embedded URLs, IPs, and paths 5. Check the Functions window sorted by size (large functions often contain core logic) ``` **Ghidra keyboard shortcuts for efficient navigation:** ``` G - Go to address Ctrl+E - Search for strings X - Show cross-references to current location Ctrl+Shift+F - Search memory for byte patterns L - Rename label/function ; - Add comment T - Retype variable Ctrl+L - Retype return value ``` ### Step 3: Analyze Decompiled Code Use Ghidra's decompiler to understand function logic: ```c // Example: Ghidra decompiler output for a decryption routine // Analyst renames variables and adds types for clarity void decrypt_config(BYTE *encrypted_data, int data_len, BYTE *key, int key_len) { // XOR decryption with rolling key for (int i = 0; i < data_len; i++) { encrypted_data[i] = encrypted_data[i] ^ key[i % key_len]; } return; } // Analyst actions in Ghidra: // 1. Right-click parameters -> Retype to correct types (BYTE*, int) // 2. Right-click variables -> Rename to meaningful names // 3. Add comments explaining the algorithm // 4. Set function signature to propagate types to callers ``` ### Step 4: Trace C2 Communication Logic Follow the network communication code path: ``` Analysis Steps for C2 Protocol Reverse Engineering: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1. Find InternetOpenA/WinHttpOpen call -> trace to wrapper function 2. Follow data flow from encrypted config -> URL construction 3. Identify HTTP method (GET/POST), headers, and body format 4. Locate response parsing logic (JSON parsing, custom binary protocol) 5. Map the C2 command dispatcher (switch/case or jump table) 6. Document the command set (download, execute, exfiltrate, update, uninstall) ``` **Ghidra Script for extracting C2 configuration:** ```python # Ghidra Python script: extract_c2_config.py # Run via Script Manager in Ghidra from ghidra.program.model.data import StringDataType from ghidra.program.model.symbol import SourceType # Search for XOR decryption patterns listing = currentProgram.getListing() memory = currentProgram.getMemory() # Find references to InternetOpenA symbol_table = currentProgram.getSymbolTable() for symbol in symbol_table.getExternalSymbols(): if "InternetOpen" in symbol.getName(): refs = getReferencesTo(symbol.getAddress()) for ref in refs: print("C2 init at: {}".format(ref.getFromAddress())) ``` ### Step 5: Analyze Encryption and Obfuscation Identify and document cryptographic routines: ``` Common Malware Encryption Patterns: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ XOR Cipher: Loop with XOR operation, often single-byte or rolling key RC4: Two loops (KSA + PRGA), 256-byte S-box initialization AES: Look for S-box constants (0x63, 0x7C, 0x77...) or calls to CryptEncrypt Base64: Lookup table with A-Za-z0-9+/= characters Custom: Combination of arithmetic operations (ADD, SUB, ROL, ROR with XOR) Identification Tips: - Search for constants: AES S-box, CRC32 table, MD5 init values - Look for loop structures operating on byte arrays - Check for Windows Crypto API usage (CryptAcquireContext -> CryptCreateHash -> CryptEncrypt) - FindCrypt Ghidra plugin automatically identifies crypto constants ``` ### Step 6: Document Findings and Create Detection Signatures Produce actionable intelligence from reverse engineering: ```bash # Generate YARA rule from unique code patterns found in Ghidra cat << 'EOF' > malware_family_x.yar rule MalwareFamilyX_Decryptor { meta: description = "Detects MalwareX decryption routine" author = "analyst" date = "2025-09-15" strings: // XOR decryption loop with hardcoded key $decrypt = { 8A 04 0E 32 04 0F 88 04 0E 41 3B CA 7C F3 } // C2 URL pattern after decryption $c2_pattern = "/gate.php?id=" ascii condition: uint16(0) == 0x5A4D and $decrypt and $c2_pattern } EOF ``` ## Key Concepts | Term | Definition | |------|------------| | **Disassembly** | Converting machine code bytes into human-readable assembly language instructions; Ghidra's Listing view shows disassembled code | | **Decompilation** | Lifting assembly code to pseudo-C representation for easier analysis; Ghidra's Decompile window provides this view | | **Cross-Reference (XREF)** | Reference showing where a function or data address is called from or used; essential for tracing code execution flow | | **Control Flow Graph (CFG)** | Visual representation of all possible execution paths through a function; reveals branching logic and loops | | **Original Entry Point (OEP)** | The actual start address of the malware code after unpacking; packers redirect execution through an unpacking stub first | | **Function Signature** | The return type, name, and parameter types of a function; applying correct signatures improves decompiler output quality | | **Ghidra Script** | Python or Java automation script executed within Ghidra to perform batch analysis, pattern searching, or data extraction | ## Tools & Systems - **Ghidra**: NSA's open-source software reverse engineering suite with disassembler, decompiler, and scripting support for multiple architectures - **IDA Pro/Free**: Industry-standard interactive disassembler; IDA Free provides x86/x64 cloud-based decompilation - **Binary Ninja**: Commercial reverse engineering platform with modern UI and extensive API for plugin development - **x64dbg**: Open-source x64/x32 debugger for Windows used alongside Ghidra for dynamic debugging of malware - **FindCrypt (Ghidra Plugin)**: Plugin that identifies cryptographic constants and algorithms in binary code ## Common Scenarios ### Scenario: Reversing Custom C2 Protocol **Context**: Behavioral analysis shows encrypted traffic to an external IP on a non-standard port. Network signatures cannot detect variants because the protocol is proprietary. Deep reverse engineering is needed to understand the protocol structure. **Approach**: 1. Import the unpacked sample into Ghidra and run full auto-analysis 2. Locate socket/WinHTTP API calls and trace backwards to the calling function 3. Identify the encryption routine called before data is sent (follow data flow from send/HttpSendRequest) 4. Reverse the encryption (XOR key extraction, RC4 key derivation, AES key location) 5. Map the command structure by analyzing the response parsing function (switch/case on command IDs) 6. Document the protocol format (header structure, command bytes, encryption method) 7. Create a protocol decoder script for network monitoring tools **Pitfalls**: - Not running the full auto-analysis before starting manual analysis (missing function boundaries and type propagation) - Ignoring indirect calls through function pointers or vtables (use cross-references to data holding function addresses) - Spending time on library code that Ghidra's Function ID (FID) or FLIRT signatures should have identified - Not saving Ghidra project progress frequently (analysis state can be lost on crashes) ## Output Format ``` REVERSE ENGINEERING ANALYSIS REPORT ===================================== Sample: unpacked_payload.exe SHA-256: abc123def456... Architecture: x86 (32-bit PE) Ghidra Project: MalwareX_Analysis FUNCTION MAP 0x00401000 main() - Entry point, initializes config 0x00401200 decrypt_config() - XOR decryption with 16-byte key 0x00401400 init_c2() - WinHTTP initialization, URL construction 0x00401800 c2_beacon() - HTTP POST beacon with system info 0x00401C00 cmd_dispatcher() - Switch on 12 command codes 0x00402000 inject_process() - Process hollowing into svchost.exe 0x00402400 persist_registry() - HKCU Run key persistence 0x00402800 exfil_data() - File collection and encrypted upload C2 PROTOCOL Method: HTTPS POST to /gate.php Encryption: RC4 with derived key (MD5 of bot_id + campaign_key) Bot ID Format: MD5(hostname + username + volume_serial) Beacon Interval: 60 seconds with 10% jitter Command Set: 0x01 - Download and execute file 0x02 - Execute shell command 0x03 - Upload file to C2 0x04 - Update configuration 0x05 - Uninstall and remove traces ENCRYPTION DETAILS Algorithm: RC4 Key Derivation: MD5(bot_id + "campaign_2025_q3") Hardcoded Seed: "campaign_2025_q3" at offset 0x00405A00 EXTRACTED IOCs C2 URLs: hxxps://update.malicious[.]com/gate.php hxxps://backup.evil[.]net/gate.php (failover) Campaign ID: campaign_2025_q3 RC4 Key Material: [see encryption details above] ```