--- name: assembly-x86 description: x86-64 assembly skill for reading, writing, and debugging assembly code. Use when reading GCC/Clang assembly output, writing inline asm in C/C++, understanding the System V AMD64 ABI calling convention, or debugging register and stack state. Activates on queries about x86-64 assembly, AT&T vs Intel syntax, inline asm, calling conventions, SIMD intrinsics, or reading disassembly output from objdump or GDB. --- # x86-64 Assembly ## Purpose Guide agents through x86-64 assembly: reading compiler output, understanding the ABI, writing inline asm, and common patterns. ## Triggers - "How do I read the assembly GCC generated?" - "What are the x86-64 registers?" - "What is the calling convention on Linux/macOS?" - "How do I write inline assembly in C?" - "How do I use SSE/AVX intrinsics?" - "This assembly uses `%rsp` / `%rbp` — what does it mean?" ## Workflow ### 1. Generate and read assembly ```bash # AT&T syntax (GCC default) gcc -S -O2 -fverbose-asm foo.c -o foo.s # Intel syntax gcc -S -masm=intel -O2 foo.c -o foo.s # From GDB (gdb) disassemble /s main # with source (gdb) x/20i $rip # From objdump objdump -d -M intel -S prog # Intel + source (needs -g) ``` ### 2. x86-64 registers | 64-bit | 32-bit | 16-bit | 8-bit high | 8-bit low | Purpose | |--------|--------|--------|-----------|----------|---------| | `%rax` | `%eax` | `%ax` | `%ah` | `%al` | Return value / accumulator | | `%rbx` | `%ebx` | `%bx` | `%bh` | `%bl` | Callee-saved | | `%rcx` | `%ecx` | `%cx` | `%ch` | `%cl` | 4th arg / count | | `%rdx` | `%edx` | `%dx` | `%dh` | `%dl` | 3rd arg / 2nd return | | `%rsi` | `%esi` | `%si` | — | `%sil` | 2nd arg | | `%rdi` | `%edi` | `%di` | — | `%dil` | 1st arg | | `%rbp` | `%ebp` | `%bp` | — | `%bpl` | Frame pointer (callee-saved) | | `%rsp` | `%esp` | `%sp` | — | `%spl` | Stack pointer | | `%r8`–`%r11` | `%r8d`–`%r11d` | `%r8w`–`%r11w` | — | `%r8b`–`%r11b` | 5th–8th args / caller-saved | | `%r12`–`%r15` | `%r12d`–`%r15d` | `%r12w`–`%r15w` | — | `%r12b`–`%r15b` | Callee-saved | | `%rip` | | | | | Instruction pointer | | `%rflags` | `%eflags` | | | | Status flags | | `%xmm0`–`%xmm7` | | | | | FP/SIMD args and return | | `%xmm8`–`%xmm15` | | | | | Caller-saved SIMD | | `%ymm0`–`%ymm15` | | | | | AVX 256-bit | | `%zmm0`–`%zmm31` | | | | | AVX-512 512-bit | ### 3. System V AMD64 ABI (Linux, macOS, FreeBSD) **Integer/pointer argument registers (in order):** `%rdi, %rsi, %rdx, %rcx, %r8, %r9` **Floating-point argument registers:** `%xmm0`–`%xmm7` **Return values:** - Integer: `%rax` (low), `%rdx` (high if 128-bit) - Float: `%xmm0` (low), `%xmm1` (high) **Caller-saved (scratch):** `%rax, %rcx, %rdx, %rsi, %rdi, %r8–%r11, %xmm0–%xmm15` **Callee-saved (must preserve):** `%rbx, %rbp, %r12–%r15` **Stack:** 16-byte aligned before `call`; `call` pushes 8 bytes → 16-byte aligned at function entry after prologue. **Red zone:** 128 bytes below `%rsp` may be used by leaf functions without adjusting `%rsp`. Not available in kernel/signal handlers. ### 4. Common instruction patterns | Pattern | Meaning | |---------|---------| | `mov %rdi, %rax` | Copy rdi to rax | | `mov (%rdi), %rax` | Load 8 bytes from address in rdi | | `mov %rax, 8(%rdi)` | Store rax to rdi+8 | | `lea 8(%rdi), %rax` | Load effective address rdi+8 into rax (no memory access) | | `push %rbx` | Push rbx; rsp -= 8 | | `pop %rbx` | Pop into rbx; rsp += 8 | | `call foo` | Push return addr; jmp foo | | `ret` | Pop return addr; jmp to it | | `xor %eax, %eax` | Zero rax (smaller encoding than `mov $0, %rax`) | | `test %rax, %rax` | Set ZF if rax == 0 (cheaper than `cmp $0, %rax`) | | `cmp $5, %rdi` | Set flags for rdi - 5 | | `jl label` | Jump if signed less than | ### 5. AT&T vs Intel syntax | Feature | AT&T | Intel | |---------|------|-------| | Operand order | source, dest | dest, source | | Register prefix | `%rax` | `rax` | | Immediate prefix | `$42` | `42` | | Memory operand | `8(%rdi)` | `[rdi+8]` | | Size suffix | `movl`, `movq` | — (inferred) | GCC emits AT&T by default. Use `-masm=intel` for Intel syntax. ### 6. Inline assembly (GCC extended asm) ```c // Basic: increment a register int x = 5; __asm__ volatile ( "incl %0" : "=r"(x) // outputs: =r means write-only register : "0"(x) // inputs: 0 means same as output 0 : // clobbers: none ); // CPUID example uint32_t eax, ebx, ecx, edx; __asm__ volatile ( "cpuid" : "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx) : "a"(1) // input: leaf 1 ); // Atomic increment static inline int atomic_inc(volatile int *p) { int ret; __asm__ volatile ( "lock; xaddl %0, %1" : "=r"(ret), "+m"(*p) : "0"(1) : "memory" ); return ret + 1; } ``` Constraint codes: - `"r"` — any general register - `"m"` — memory operand - `"i"` — immediate integer - `"a"`, `"b"`, `"c"`, `"d"` — specific registers (%rax, %rbx, %rcx, %rdx) - `"="` prefix — output (write-only) - `"+"` prefix — read-write - `"memory"` clobber — tells compiler memory may be modified (barrier) ### 7. SSE/AVX intrinsics (preferred over inline asm) ```c #include // includes all x86 SIMD headers // Add 8 floats at once with AVX __m256 a = _mm256_loadu_ps(arr_a); // load 8 floats (unaligned) __m256 b = _mm256_loadu_ps(arr_b); __m256 c = _mm256_add_ps(a, b); _mm256_storeu_ps(result, c); ``` Check CPU support at compile time: `-mavx2` or `-march=native`. Check at runtime: `__builtin_cpu_supports("avx2")`. For a full register and instruction reference, see [references/reference.md](references/reference.md). ## Related skills - Use `skills/low-level-programming/assembly-arm` for AArch64/ARM assembly - Use `skills/compilers/gcc` for `-S -masm=intel` flag details - Use `skills/debuggers/gdb` for stepping through assembly (`si`, `ni`, `x/i`)