--- name: deobfuscating-javascript-malware description: 'Deobfuscates malicious JavaScript code used in web-based attacks, phishing pages, and dropper scripts by reversing encoding layers, eval chains, string manipulation, and control flow obfuscation to reveal the original malicious logic. Activates for requests involving JavaScript malware analysis, script deobfuscation, web skimmer analysis, or obfuscated dropper investigation. ' domain: cybersecurity subdomain: malware-analysis tags: - malware - JavaScript - deobfuscation - web-malware - script-analysis version: 1.0.0 author: mahipal license: Apache-2.0 nist_csf: - DE.AE-02 - RS.AN-03 - ID.RA-01 - DE.CM-01 --- # Deobfuscating JavaScript Malware ## When to Use - Investigating a phishing page with obfuscated JavaScript that performs credential harvesting or redirect - Analyzing a web skimmer (Magecart-style) injected into an e-commerce site - Deobfuscating a JavaScript dropper that downloads and executes second-stage malware - Examining malicious email attachments containing HTML files with embedded obfuscated scripts - Analyzing browser exploit kits that use heavy JavaScript obfuscation to hide exploit delivery **Do not use** for obfuscated JavaScript that is merely minified production code; use a standard beautifier instead. ## Prerequisites - Node.js 18+ installed for executing and debugging JavaScript in a controlled environment - Python 3.8+ with `jsbeautifier` library for code formatting - Browser developer tools (Chrome DevTools) for controlled execution in an isolated browser - CyberChef (https://gchq.github.io/CyberChef/) for encoding/decoding operations - de4js or JStillery for automated JavaScript deobfuscation - Isolated analysis VM with no access to production systems or sensitive data ## Workflow ### Step 1: Safely Extract and Examine the Obfuscated Script Isolate the malicious JavaScript without executing it: ```bash # Extract JavaScript from HTML file python3 << 'PYEOF' from html.parser import HTMLParser class ScriptExtractor(HTMLParser): def __init__(self): super().__init__() self.in_script = False self.scripts = [] self.current = "" def handle_starttag(self, tag, attrs): if tag == "script": self.in_script = True self.current = "" def handle_endtag(self, tag): if tag == "script": self.in_script = False if self.current.strip(): self.scripts.append(self.current) def handle_data(self, data): if self.in_script: self.current += data with open("malicious_page.html") as f: parser = ScriptExtractor() parser.feed(f.read()) for i, script in enumerate(parser.scripts): with open(f"script_{i}.js", "w") as f: f.write(script) print(f"Extracted script_{i}.js ({len(script)} bytes)") PYEOF # Beautify the extracted JavaScript npx js-beautify script_0.js -o script_0_pretty.js ``` ### Step 2: Identify Obfuscation Techniques Categorize the obfuscation methods used: ``` Common JavaScript Obfuscation Techniques: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ String Encoding: - Hex encoding: "\x68\x65\x6c\x6c\x6f" -> "hello" - Unicode escapes: "\u0068\u0065\u006c\u006c\u006f" -> "hello" - Base64: atob("aGVsbG8=") -> "hello" - charCodeAt/fromCharCode: String.fromCharCode(104,101,108,108,111) - Array-based lookup: var _0x1234 = ["hello","world"]; _0x1234[0] Eval Chains: - eval(atob("...")) - eval(unescape("...")) - new Function("return " + decoded)() - document.write("") - setTimeout(decoded, 0) Control Flow: - Switch-case dispatcher with shuffled case order - Opaque predicates (always-true/false conditions) - Dead code insertion - Variable name mangling (_0x4a3b, _0xab12) Anti-Analysis: - Debugger traps: setInterval(function(){debugger;}, 100) - Console detection: overriding console.log - Timing checks: performance.now() deltas - DevTools detection: window.outerWidth - window.innerWidth > 100 ``` ### Step 3: Remove Anti-Analysis Protections Neutralize anti-debugging and anti-analysis traps: ```javascript // Remove debugger traps before analysis // Replace in the obfuscated script: // Before: setInterval(function() { debugger; }, 100); // After (neutralized): setInterval(function() { /* debugger removed */ }, 100); // Neutralize DevTools detection // Before: if (window.outerWidth - window.innerWidth > 160) { window.location = "about:blank"; } // After: if (false) { window.location = "about:blank"; } // Neutralize timing checks // Override performance.now to return consistent values const originalNow = performance.now; performance.now = function() { return 0; }; ``` ### Step 4: Decode String Obfuscation Layers Progressively decode encoded strings: ```python # Python script to decode common JS obfuscation patterns import re import base64 import urllib.parse def decode_hex_strings(code): """Replace \\xNN sequences with ASCII characters""" def hex_replace(match): hex_str = match.group(0) try: return bytes.fromhex(hex_str.replace("\\x", "")).decode("ascii") except: return hex_str return re.sub(r'(?:\\x[0-9a-fA-F]{2})+', hex_replace, code) def decode_unicode_escapes(code): """Replace \\uNNNN sequences with characters""" def unicode_replace(match): return chr(int(match.group(1), 16)) return re.sub(r'\\u([0-9a-fA-F]{4})', unicode_replace, code) def decode_charcode_arrays(code): """Resolve String.fromCharCode calls""" def charcode_replace(match): codes = [int(c.strip()) for c in match.group(1).split(",")] return '"' + "".join(chr(c) for c in codes) + '"' return re.sub(r'String\.fromCharCode\(([0-9,\s]+)\)', charcode_replace, code) def decode_base64_strings(code): """Resolve atob() calls with static strings""" def atob_replace(match): try: decoded = base64.b64decode(match.group(1)).decode("utf-8") return f'"{decoded}"' except: return match.group(0) return re.sub(r'atob\(["\']([A-Za-z0-9+/=]+)["\']\)', atob_replace, code) # Apply all decoders with open("script_0.js") as f: code = f.read() code = decode_hex_strings(code) code = decode_unicode_escapes(code) code = decode_charcode_arrays(code) code = decode_base64_strings(code) with open("script_0_decoded.js", "w") as f: f.write(code) print("Decoded strings written to script_0_decoded.js") ``` ### Step 5: Resolve Eval Chains Safely Unwrap eval/Function constructor chains without executing: ```javascript // Node.js script to safely resolve eval chains // Run in isolated environment: node --experimental-vm-modules deobfuscate.js const vm = require('vm'); // Create sandboxed context with logging const sandbox = { eval: function(code) { console.log("=== EVAL INTERCEPTED ==="); console.log(code.substring(0, 500)); console.log("========================"); return code; // Return the code instead of executing it }, document: { write: function(html) { console.log("=== DOCUMENT.WRITE INTERCEPTED ==="); console.log(html.substring(0, 500)); }, getElementById: function() { return { innerHTML: "" }; } }, window: { location: { href: "" } }, atob: function(s) { return Buffer.from(s, 'base64').toString(); }, unescape: unescape, setTimeout: function(fn) { if (typeof fn === 'string') console.log("TIMEOUT CODE:", fn); }, console: console, String: String, Array: Array, parseInt: parseInt, RegExp: RegExp, }; const context = vm.createContext(sandbox); // Load and execute the obfuscated script in sandbox const fs = require('fs'); const code = fs.readFileSync('script_0.js', 'utf8'); try { vm.runInContext(code, context, { timeout: 5000 }); } catch(e) { console.log("Execution error (expected):", e.message); } ``` ### Step 6: Analyze the Deobfuscated Payload Examine the revealed malicious logic: ``` Deobfuscated Malware Categories and IOC Extraction: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Credential Harvester: - Form action URLs (exfiltration endpoints) - XMLHttpRequest/fetch destinations - Targeted input field names (username, password, cc_number) Web Skimmer (Magecart): - Payment form overlay injection - Card data exfiltration URLs - Keylogger event listeners (onkeypress, oninput) Redirect Script: - Destination URLs in location.href assignments - Conditional redirects based on user-agent or referrer - Cloaking logic (show benign content to bots) Exploit Kit Landing: - Browser/plugin version checks - Exploit payload URLs - Shellcode embedded as arrays or encoded strings ``` ## Key Concepts | Term | Definition | |------|------------| | **Eval Chain** | Nested layers of eval(), Function(), or document.write() calls that each decode one layer of obfuscation before passing to the next | | **String Array Rotation** | Obfuscation technique storing all strings in a shuffled array and accessing them by computed index to hide string literals | | **Dead Code Insertion** | Adding non-functional code blocks that never execute to increase analysis complexity and confuse pattern matching | | **Opaque Predicate** | Conditional expression whose outcome is predetermined but difficult to determine statically; used to obscure control flow | | **Anti-Debugging** | JavaScript techniques to detect and thwart browser DevTools or debugger usage including debugger statements and timing checks | | **Web Skimmer** | Malicious JavaScript injected into e-commerce sites to steal payment card data from checkout forms (Magecart attack) | ## Tools & Systems - **CyberChef**: GCHQ's web-based tool for encoding/decoding transformations useful for unwinding multi-layer obfuscation - **de4js**: Online JavaScript deobfuscator supporting common obfuscation tools (obfuscator.io, JScrambler) - **Node.js VM Module**: Sandboxed JavaScript execution environment for safely evaluating obfuscated code with intercepted APIs - **Chrome DevTools**: Browser developer tools for stepping through JavaScript execution with breakpoints and console access - **JSDetox**: JavaScript malware analysis tool providing execution emulation and deobfuscation ## Common Scenarios ### Scenario: Deobfuscating a Magecart Web Skimmer **Context**: A compromised e-commerce site has obfuscated JavaScript injected into its checkout page. The script needs deobfuscation to identify the data exfiltration endpoint and determine what customer data was stolen. **Approach**: 1. Extract the injected script from the page source (often appended to a legitimate JS file or loaded from an external domain) 2. Beautify the code and identify the obfuscation technique (typically string array + rotation + hex encoding) 3. Decode string encoding layers (hex -> Unicode -> base64) using the Python decoder script 4. Resolve the string array by evaluating the array definition and rotation function 5. Identify the form targeting logic (querySelector for payment form fields) 6. Extract the exfiltration URL from the XMLHttpRequest or fetch call 7. Document stolen data fields and exfiltration endpoint for incident response **Pitfalls**: - Executing obfuscated scripts on a connected system (the script may phone home during analysis) - Not removing anti-debugging traps before using browser DevTools (infinite debugger loops) - Missing additional obfuscation layers loaded dynamically from external URLs - Overlooking base64-encoded inline images or data URIs that may contain additional scripts ## Output Format ``` JAVASCRIPT MALWARE DEOBFUSCATION REPORT ========================================= Source: checkout.js (injected into example-shop.com) Obfuscation: obfuscator.io (string array + rotation + hex encoding) Layers Removed: 3 OBFUSCATION TECHNIQUES IDENTIFIED [1] String array with 247 entries, rotated by 0x1a3 [2] Hex-encoded string references (\x68\x65\x6c\x6c\x6f) [3] Base64-wrapped eval chain (2 layers) [4] Anti-debugging: setInterval debugger trap DEOBFUSCATED FUNCTIONALITY Type: Magecart Payment Card Skimmer Target Forms: input[name*="card"], input[name*="cc_"] Data Captured: Card number, expiration, CVV, cardholder name Exfil Method: POST via XMLHttpRequest Exfil URL: hxxps://analytics-cdn[.]com/collect Exfil Format: JSON { "cn": card_number, "exp": expiry, "cv": cvv } Trigger: Form submit event on checkout page EXTRACTED IOCs Domains: analytics-cdn[.]com IPs: 185.220.101[.]42 URLs: hxxps://analytics-cdn[.]com/collect hxxps://analytics-cdn[.]com/gate.js ```