---
layout: post
title: Tiny C Binaries
author: Dylan Müller
author_url: https://linkedin.com/in/dylanmuller
---
> By default, following the linking stage, GCC generates ELF binaries that contain
> redundant section data that increase executable size.
1. [ELF Binaries](#elf-binaries)
2. [Size Optimisation](#size-optimisation)
3. [Linux Syscalls](#linux-syscalls)
4. [Custom Linker Script](#custom-linker-script)
5. [GCC flags](#gcc-flags)
6. [SSTRIP](#sstrip)
7. [Source Code](#source-code)
# ELF Binaries
The standard file format for executable object code on Linux is ELF (Executable
and Linkable Format), it is the successor to the older COFF UNIX file format.
ELF Binaries consist of two sections, the ELF header and file data (object
code). The ELF header format for 64-bit binaries is shown in the table below:
| Offset | Field | Description | Value |
|--------|------------------------|----------------------------------------|---------------------------------------------------------------------------------------|
| 0x00 | e_ident[EI_MAG0] | magic number | 0x7F |
| 0x04 | e_ident[EI_CLASS] | 32/64 bit | 0x2=64bit |
| 0x05 | e_ident[EI_DATA] | endianness | 0x1=little
0x2=big |
| 0x06 | e_ident[EI_VERSION] | elf version | 0x1=original |
| 0x07 | e_ident[EI_OSABI] | system ABI | 0x00= System V
0x02= NetBSD
0x03= Linux
0x09= FreeBSD
|
| 0x08 | e_ident[EI_ABIVERSION] | ABI Version | * ignored for static-linked binaries
* vendor specific for dynamic-linked binaries |
| 0x09 | e_ident[EI_PAD] | undefined | * padded with zeros |
| 0x10 | e_type | object type | 0x00= ET_NONE
0x01= ET_REL
0x02= ET_EXEC
0x03= ET_DYN
0x04= ET_CORE |
| 0x12 | e_machine | system ISA | 0x3E= amd64
0xB7= ARM (v8/64) |
| 0x14 | e_version | elf version | 0x1=original |
| 0x18 | e_entry | entry point | 64-bit entry point address |
| 0x20 | e_phoff | header table offset | 64-bit program header table offset |
| 0x28 | e_shoff | section table offset | 64-bit section header table offset |
| 0x30 | e_flags | undefined | vendor specific or pad with zeros |
| 0x34 | e_ehsize | elf header size | 0x40= 64bits, 0x20= 32bits |
| 0x36 | e_phentsize | header table size | - |
| 0x38 | e_phnum | #(num) entries in header table | - |
| 0x3A | e_shentsize | section table size | - |
| 0x3C | e_shnum | #(num) entries in section table | - |
| 0x3E | e_shstrndx | section names index into section table | - |
| 0x40 | | | End of 64-bit ELF |
These data fields are used by the Linux PL (program loader) to resolve the entry
point for code execution along with various fields such as the ABI version, ISA
type, as well as section listings.
A sample hello world program is shown below and was compiled with GCC using `gcc
main.c -o example`
```
#include
int main(int agrc, char *argv[]){
printf("Hello, World!");
return 0;
}
```
This produced an output executable of almost **~17 KB** ! If you've ever
programmed in assembly you might be surprised at the rather large file size for
such a simple program.
GNU-binutils `objdump` allows us to inspect the full list of ELF sections with
the `-h` flag.
After running `objdump -h example` on our sample binary we see that there are a
large number of GCC derived sections: `.gnu.version` and `.note.gnu.property`
attached to the binary image. The question becomes how much data these
additional sections are consuming and to what degree can we 'strip' out
redundant data.
![enter image description here](https://lunarjournal.github.io/images/2/01.png)
GNU-binutils comes with a handy utility called `strip`, which attempts to remove
unused ELF sections from a binary. Running `strip -s example` results only in a
slightly reduced file of around **~14.5 KB**. Clearly, we need to strip much
more! :open_mouth:
# Size Optimisation
GCC contains a large number of optimisation flags, these include the common :
`-O2 -O3 -Os` flags as well as many more less widely used compile time options,
which we will explore further. However, since we have not yet compiled with any
optimisation thus far, and as a first step we recompile the above example with
`-Os`, to optimise for size;
And we see no decrease in size! This is expected behaviour however, since the
`-Os` flag does not consider all redundant section data for removal, on the
contrary the additional section information placed by GCC in the output binary
is considered useful at this level of optimisation.
In addition, the use of `printf` binds object code from the standard library
into the final output executable and so we will instead call through to the
Linux kernel directly to print to the standard output stream.
# Linux syscalls
System calls on Linux are invoked with the x86_64 `syscall` opcode and syscall
parameters follow a very specific order on 64-bit architectures. For x86_64
([System V ABI - Section
A.2.1](https://refspecs.linuxfoundation.org/elf/x86_64-abi-0.99.pdf)), the order
of parameters for linux system calls is as follows:
| # | description | register (64-bit) |
|---|----------------|----------|
| 1 | syscall number | rax |
| 2 | arg 1 | rdi |
| 3 | arg 2 | rsi |
| 4 | arg 3 | rdx |
| 5 | arg 4 | r10 |
| 6 | arg 5 | r8 |
| 7 | arg 6 | r9 |
Arguments at user mode level (cdecl calling convention), however, are parsed in
the following order:
| # | description | register (64-bit) |
|---|-------------|-----|
| 1 | arg 1 | rdi |
| 2 | arg 2 | rsi |
| 3 | arg 3 | rdx |
| 4 | arg 4 | rcx |
| 5 | arg 5 | r8 |
| 6 | arg 6 | r9 |
To call through to the linux kernel from C, an assembly wrapper was required to
translate user mode arguments (C formal parameters) into kernel syscall
arguments:
```
syscall:
mov rax,rdi
mov rdi,rsi
mov rsi,rdx
mov rdx,rcx
mov r10,r8
mov r8,r9
syscall
ret
```
We may then make a call to this assembly routine from C using the following
function signature:
```
void* syscall(
void* syscall_number,
void* param1,
void* param2,
void* param3,
void* param4,
void* param5
);
```
To write to the standard output stream we invoke syscall `0x1`, which handles
file output. A useful x86_64 Linux syscall table can be found
[here](https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/).
Syscall `0x1` takes three arguments and has the following signature:
`sys_write( unsigned int fd, const char *buf, size_t count)`
A file called base.c was created, implementing both syscall and print wrappers:
```
// base.c
typedef unsigned long int uintptr;
typedef long int intptr;
void* syscall(
void* syscall_number,
void* param1,
void* param2,
void* param3,
void* param4,
void* param5
);
static intptr print(void const* data, uintptr nbytes)
{
return (intptr)
syscall(
(void*)1, /* sys_write */
(void*)(intptr)1, /* STD_OUT */
(void*)data,
(void*)nbytes,
0,
0
);
}
int main(int agrc, char *argv[]){
print("Hello, World", 12)
return 0;
}
```
In order to instruct GCC to prevent linking in standard library object code, the
`-nostdlib` flag should be passed at compile time. There is one caveat however,
in that certain symbols, such as `_start` , which handle program startup and the
parsing of the command line arguments to `main` , will be left up to us to
implement, otherwise we will segfault :-/
However, this is quite trivial and luckily program initialisation is well
defined by -- [System V ABI - Section
3.4](https://refspecs.linuxfoundation.org/elf/x86_64-abi-0.99.pdf).
Initially it is specified that register `rsp` hold the argument count, while the
address given by `rsp+0x8` hold an array of 64-bit pointers to the argument
strings.
From here the argument count and string pointer array index can be passed to
`rdi` and `rsi` respectively, the first two parameters of `main()` . Upon exit,
a call to syscall `0x3c` is then made to handle program termination gracefully.
Both the syscall and program startup assembly wrappers (written in GAS) were
placed in a file called `boot.s`:
```
/* boot.s */
.intel_syntax noprefix
.text
.globl _start, syscall
_start:
xor rbp,rbp /* rbp = 0 */
pop rdi /* rdi = argc, rsp= rsp + 8 */
mov rsi,rsp /* rsi = char *ptr[] */
and rsp,-16 /* align rsp to 16 bytes */
call main
mov rdi,rax /* rax = main return value */
mov rax,60 /* syscall= 0x3c (exit) */
syscall
ret
syscall:
mov rax,rdi
mov rdi,rsi
mov rsi,rdx
mov rdx,rcx
mov r10,r8
mov r8,r9
syscall
ret
```
Finally gcc was invoked with `gcc base.c boot.s -nostdlib -o base`
![enter image description here](https://lunarjournal.github.io/images/2/05.png)
Wait what!? We still get a ~14kb executable after all that work? Yep, and
although we have optimised the main object code for our example, we have not yet
stripped out redundant ELF code sections which contribute a majority of the file
size.
# Custom Linker Script
Although it is possible to strip some redundant sections from an ELF binary
using `strip`, it is much more efficient to use a custom linker script.
A linker script specifies precisely which ELF sections to include in the output
binary, which means we can eliminate *almost* all redundancy. Care, however,
must be taken to ensure that essential segments such as `.text`, `.data`,
`.rodata*` are not discarded during linking to avoid a segmentation fault.
The linker script that I came up with is shown below (x86_64.ld):
```
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
"elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SECTIONS
{
. = 0x400000 + SIZEOF_HEADERS;
.text : { *(.text) *(.data*) *(.rodata*) *(.bss*) }
}
```
The linker script sets the virtual base address of the output binary to 0x400000
and retains only the essential code segments.
Custom linker scripts are parsed to GCC with the `-T` switch and the resulting
binary was compiled with: `gcc -T x86_64.ld base.c boot.s -nostdlib -o base`
This produced an output executable of around **~2.7 KB**
This is much better, but there is still some room for improvement using
additional GCC compile time switches.
# GCC Flags
We have thus far managed to shrink our executable size down to ~2.7KB from our
initial file size of ~17kb by stripping redundant section data using a custom
linker script and removing standard library object code.
However, GCC has several compile time flags that can further help in removing
unwanted code sections, these include:
| flag | description |
|----------------------|---------------------------------------|
| -ffunction-sections | place each function into own section |
| -fdata-sections | place each data item into own section |
| -Wl,--gc-sections | strip unused sections (linker) |
| -fno-unwind-tables | remove unwind tables |
| -Wl,--build-id=none | remove build-id section |
| -Qn | remove .ident directives |
| -Os | optimize code for size |
| -s | strip all sections |
Compiling our example again with: `gcc -T x86_64.ld base.c boot.s -nostdlib -o
base -ffunction-sections -fdata-sections -Wl,--gc-sections -fno-unwind-tables
-Wl,--build-id=none -Qn -Os -s`
This produces an output executable with a size of **~1.5KB** but we can still go
further!
Additionally, you can include the `-static` switch to ensure a static binary.
This results in an output executable of **~640 bytes**.
# SSTRIP
Despite all our optimisation thus far, there are still a few redundant code and
data sections in our dynamically linked output executable. Enter sstrip...
[sstrip](https://github.com/aunali1/super-strip) is a useful utility that
attempts to identify which sections of an ELF binary are to be loaded into
memory during program execution. Based off this, all unused code and data
sections are then subsequently removed. It is comparable to `strip` but performs
section removal more aggressively.
Running `./sstrip base` we get our final executable binary with a size of **~830
bytes** !
At this point it would probably be best to switch to assembly to get smaller
file sizes, however the goal of this journal was to create small executables
written in C and I think we've done quite well to reduce in size from ~17kb
down to ~830 bytes!
![enter image description here](https://lunarjournal.github.io/images/2/08.png)
As a final comment you might be wondering if we could have simply run `sstrip`
from our 17kb executable in the first place and the answer would be, no.
I tried doing this and ended up with a binary image of around ~12 KB so it seems
the sstrip needs a bit of additional assistance in the form our our manual
optimisations to get really tiny binaries!
# Source Code
Source code used in this journal is available at:
[https://github.com/lunarjournal/tinybase](https://github.com/lunarjournal/tinybase)