An Introduction To Low-level Programming: Building a Gameboy Emulator in C. Goals This book has 2 distinct, largely orthogonal, although at times slighly conflicting, objectives: - First it aims at being an ideal introduction to everything low-level. You should get away with a firm intuition and mental model of the low level (we'll describe what we mean by "low level" shortly). - Second, and on equal footing, by working on a real and big enough project, it aspires to teach practically a couple of foundational aspects of software construction: modularity and simplicity. (The conflicting part arises when we deliberately (slighly) over-complicate some aspect of the design or implementation in order to show some interesting low level aspect. When this is done, it will be noted). First Goal: What We Mean By "Low Level" - The first sense of low level is being exposed to the hardware system. This includes CPU ISA-level architecture exposure through some form of Assembly language, but also how the CPU addresses memory and IO, how it handles interrupts, etc. - A second meaning of low level is related to the C programming language: for example, we are interesting in understanding how memory is allocated (whether we manually allocate it in the heap, or it is allocated and handled automatically by the runtime on the stack); - The third and last sense of "low level" that we are interested in here is using the lowest interface possible to get things done. For example, we interface directly with the Wayland compositor through libwayland, instead of using higher-level apis such as libSDL; we instruct the C compiler to create a special section in the resulting object file; we interface directly with the OS with system calls such as epoll in our event-driven design; we use pthreads and a simple example of a mutex and a read/write lock to synchronize between threads; we talk to libevdev to get input events (the key presses from the keyboard); we use pixman for rendering; we install signal handlers to show the concept of signals, which is a part of how Unix/Linux processes are structured; we build 3 different binaries, one of which is a shared library that implements a simple IPC protocol through sockets. Second Goal: Modularity and Simplicity It is important to be exposed as quickly as possible to the right intuitions for solid software development. This can take the form of learning the right design patterns relevant to the particular domain and development tools available. Be it Java, C++ or Matlab, experience through time leads the relevant community to develop sound patterns for recurrent problems: what you're doing has probably been done thousands of times before, at least in structure and form (the particulars are of course unique). Regardless of the specifics of the technologies involved, at least 2 aspects are ubiquitous in every successful and beautiful software system: the concepts of modularity and simplicity. Modularity This is, in my opinion, perhaps the single most important concept to understand and develop in software engineering. It is intimately related to the idea of 'abstraction'. The classic example for this is the universality of the file abstraction that is at the core of the so-called "Unix philosophy" (which, of course, Linux has inherited). It has been so successful, that its ubiquitousness make it hard for us to take a step back and acknowledge its genius. This is not to say that it's perfect; nothing in engineering is, where everything is a tradeoff, but it does serve as a perfect illustration of what should be aimed at in the design of good modularity in software systems: the introduction of useful abstractions through relatively simple interfaces. Let's take a moment to think about other great abstractions in the whole spectrum of computing: - The OS provides the 'process' abstraction, which encapsulates the information needed for an instance of a program to execute on a particular piece of hardware. - Also the OS provides the virtual memory abstraction. Using C will let us appreciate the importance of modularity, since its rawness (in this case meaning a lack in explicit, modern support for modularity) implies that we need to take extra care in how we organize stuff around. Simplicity It can be a sort of an epiphany to realize that genius code is simple, obvious code. One reason for this that I've found plausible is that, since software is inherently complex, which could be described as software development being a battle against entropy, and chaos is bad/ugly, the opposite of it being the good/beautiful, then whatever order we can accomplish must be the driving principle of design and development. In this sense, it is interesting to note the interrelation between aesthetics and ethics. Arguably, simplicity is at the core here, in the following sense: the first line of defense against complexity is to avoid it in the first place! Of course this needs to be nuanced, as the famous "everything should be as simple as it can be, but not simpler" quote hints. I hope I can show some of the ways code can be simple in this book. And again, the rawness of C forces us to be extra-mindful of this aspect. The Plan We are going to go through the design and implementation of a C program, using low-level interfaces to get things done. But what would be a good choice here? The app we are going to build should lend itself to maximizing our learning experience according to our goals. The choice should tick the following boxes: - Should be not as simple as a toy program, but also not as complex as an enterprise-level app: So no calculators and no SAP applications. - Should be useful: Defined here as something you would reasonably want to run every now and then. - Should expose aspects of the low level, in all of the meanings in [What We Mean By "Low Level"] I think I've hit the sweet spot: we'll design and implement a Gameboy emulator! First, Gameboy is simple enough. It is actually a commonly recommended target to emulate for people starting out in emulation. The NES, not to mention the SNES and beyond, are outside the scope of complexity for our purposes. Second, implementing an emulator (or more generally a virtual machine) is a great exercise in understanding the low level: you must understand the target architecture to the degree necessary to, well, emulate its behavior. This includes the CPU instruction set, the details about how the CPU addresses memory and IO, interrupts, and a bunch of other stuff. And finally, I think the Gameboy also hits a sweet spot in terms of 'usefulness', defined here as something you would actually want to use more than once: simpler consoles like the Atari 2600 I think don't quite make it in this sense; as much as I am a person that greatly appreciates old games, I can't find myself wanting to actually play its game library for more than 5 minutes (I did enjoy completing Adventure, but other than that I couldn't find myself immersed in any other game I tried). On the other hand, a Gameboy will let us leverage the geniuses behind games like Pokemon Red, three Donkey Kong Land games, 5 Mega Man games, 2 Mario games, Final Fantasy stories, Contra, Prince of Persia and more. Now that is what I call a useful app! (of course this is entirely subjective and biased; I would perfectly understand someone who couldn't stand more than 5 minutes in any of these games). Here are some of the more interesting things we'll implement: - The project consists of 3 different programs: - realboy: the emulator itself. this will also implement the server side of libemu. - realboy-monitor: we'll be able to control realboy in various ways through this executable, effectively acting as a client to libemu. - libemu: an IPC library to enable the client (realboy-monitor) to communicate with the server (realboy) by a simple protocol throught unix sockets. - We'll use 2 posix threads (pthreads). - One thread is the main thread, which will execute the virtual machine interpreter. - A second thread will use the epoll Linux interface to listen for various kinds of events: Wayland, Pipewire and libevdev events. Here we will use a mutex to protect access to some shared variables. - We'll also use a read/write lock to avoid having to continuously poll a file descriptor, substantially enhancing performance. - Understand signals by installing a signal handler for SIGINT. - Handle the 2 aspects of graphics manipulation: rendering and presentation. - Implement a rendering driver through pixman, applying a simple tranform and filter. - Talk directly to the Wayland compositor by implementing a Wayland driver to handle the presentation of the rendered pixels. - We'll see how easy a modern build system like Meson Why C? One reason for choosing C has already been hinted in previous sections: C is raw. This rawness implies that extra care must be taken in every aspect of the system being built. This, I believe, is a very good pedagogical reason for choosing it. But more importantly: C is the lingua franca of computing, the foundation upon which much of the computing world makes sense. It is undoubtful that this is the case, but let me refer you to a very good article that I think explains this very well. (https://faultlore.com/blah/c-isnt-a-language/). My hot take is that C is still the most effective way to introduce systems programming in 2025. You will only benefit from learning C, with all its (very obvious, in retrospect) faults and limitations. Finally, C is simple. This is sometimes contended by pointing to the mess of its integer hierarchy system, and other ways in which by today standards it has an undoubtly archane and archaic design. But I still think that C is fundamentally simple, if only in the obvious sense of its syntax: having fewer elements in the language naturally makes it simpler. This may be hard to appreciate, but is evident when at any given moment in time, what is happening with your program fits in its entirety in your head (the antithesis of languages like C++ and Rust). Don't get me wrong, Rust is awesome, and I have great respect for C++ and its history, but you will see that it is common in their circles that discussions about trying to accomplish something quickly devolves into a discussion about the language itself. I really like Zig's ethos evident in their slogan "Focus on debugging your application rather than debugging your programming language knowledge." Why not Zig, then? unfortunately, I believe that at this point Zig still has not crossed the threshold to becoming a realistic option in the systems programming landscape (a threshold that I believe Rust has crossed, for example). No-AI Policy: Building intuition and understanding takes effort and time. There is no AI-substitute for this. So, for no other reasons than pedagogy, we adhere to a strict no-AI policy: nothing in this project has been done through AI, and contributions will only be considered if no AI was involved.