An overview of memory management in QEMU: I. RAM Management: ================== I.1. RAM Address space: ----------------------- All pages of virtual RAM used by QEMU at runtime are allocated from contiguous blocks in a specific abstract "RAM address space". |ram_addr_t| is the type of block addresses in this space. A single block of contiguous RAM is allocated with 'qemu_ram_alloc()', which takes a size in bytes, and allocates the pages through mmap() in the QEMU host process. It also sets up the corresponding KVM / Xen / HAX mappings, depending on each accelerator's specific needs. Each block has a name, which is used for snapshot support. 'qemu_ram_alloc_from_ptr()' can also be used to allocated a new RAM block, by passing its content explicitly (can be useful for pages of ROM). 'qemu_get_ram_ptr()' will translate a 'ram_addr_t' into the corresponding address in the QEMU host process. 'qemu_ram_addr_from_host()' does the opposite (i.e. translates a host address into a ram_addr_t if possible, or return an error). Note that ram_addr_t addresses are an internal implementation detail of QEMU, i.e. the virtual CPU never sees their values directly; it relies instead of addresses in its virtual physical address space, described in section II. below. As an example, when emulating an Android/x86 virtual device, the following RAM space is being used: 0x0000_0000 ... 0x1000_0000 "pc.ram" 0x1000_0000 ... 0x1002_0000 "bios.bin" 0x1002_0000 ... 0x1004_0000 "pc.rom" I.2. RAM Dirty tracking: ------------------------ QEMU also associates with each RAM page an 8-bit 'dirty' bitmap. The main idea is that whenever a page is written to, the value 0xff is written to the page's 'dirty' bitmap. Various clients can later inspect some of the flags and clear them. I.e.: VGA_DIRTY_FLAG (0x1) is typically used by framebuffer drivers to detect which pages of video RAM were touched since the latest VSYNC. The driver typically copies the pixel values to the real QEMU output, then clears the bits. This is very useful to avoid needless copies if nothing changed in the framebuffer. MIGRATION_DIRTY_FLAG (0x8) is used to tracked modified RAM pages during live migration (i.e. moving a QEMU virtual machine from one host to another) CODE_DIRTY_FLAG (0x2) is a bit more special, and is used to support self-modifying code properly. More on this later. II. The physical address space: =============================== Represents the address space that the virtual CPU can read from / write to. |hwaddr| is the type of addresses in this space, which is decomposed into 'pages'. Each page in the address space is either unassigned, or mapped to a specific kind of memory region. See |phys_page_find()| and |phys_page_find_alloc()| in translate-all.c for the implementation details. II.1. Memory region types: -------------------------- There are several memory region types: - Regions of RAM pages. - Regions of ROM pages (similar to RAM, but cannot be written to). - Regions of I/O pages, used to communicate with virtual hardware. Virtual devices can register a new I/O region type by calling |cpu_register_io_memory()|. This function allows them to provide callbacks that will be invoked every time the virtual CPU reads from or writes to any page of the corresponding type. The memory region type of a given page is encoded using PAGE_BITS bits in the following format: +-------------------------------+ | mem_type_index | flags | +-------------------------------+ Where |mem_type_index| is a unique value identifying a given memory region type, and |flags| is a 3-bit bitmap used to store flags that are only relevant for I/O pages. The following memory region type values are important: IO_MEM_RAM (mem_type_index=0, flags=0): Used for regular RAM pages, always all zero on purpose. IO_MEM_ROM (mem_type_index=1, flags=0): Used for ROM pages. IO_MEM_UNASSIGNED (mem_type_index=2, flags=0): Used to identify unassigned pages of the physical address space. IO_MEM_NOTDIRTY (mem_type_index=3, flags=0): Used to implement tracking of dirty RAM pages. This is essentially used for RAM pages that have not been written to yet. Any mem_type_index value of 4 or higher corresponds to a device-specific I/O memory region type (i.e. with custom read/write callbaks, a corresponding 'opaque' value), and can also use the following bits in |flags|: IO_MEM_ROMD (0x1): Used for ROM-like I/O pages, i.e. they are backed by a page from the RAM address space, but writing to them triggers a device-specific write callback (instead of being ignored or faulting the CPU). IO_MEM_SUBPAGE (0x02) Used to indicate that not all addresses in this page map to the same I/O region type / callbacks. IO_MEM_SUBWIDTH (0x04) Probably obsolete. Set to indicate that the corresponding I/O region type doesn't support reading/writing values of all possible sizes (1, 2 and 4 bytes). This seems to be never used by the current code. Note that cpu_register_io_memory() returns a new memory region type value. II.2. Physical address map: --------------------------- QEMU maintains for each assigned page in the physical address space two values: |phys_offset|, a combination of ram address and memory region type. |region_offset|, an optional offset into the region backing the page. This is only useful for I/O pages. The |phys_offset| value has many interesting encoding which require further clarification: - Generally speaking, a phys_offset value is decomposed into the following bit fields: +-----------------------------------------------------+ | high_addr | mem_type | +-----------------------------------------------------+ where |mem_type| is a PAGE_BITS memory region type as described previously, and |high_addr| may contain the high bits of a ram_addr_t address for RAM-backed pages. More specifically: - Unassigned pages always have the special value IO_MEM_UNASSIGNED (high_addr=0, mem_type=IO_MEM_UNASSIGNED) - RAM pages have mem_type=0 (i.e. IO_MEM_RAM) while high_addr are the high bits of the corresponding ram_addr_t. Hence, a simple call to qemu_get_ram_ptr(phys_offset) will return the corresponding address in host QEMU memory. This is the reson why IO_MEM_RAM is always 0: RAM page phys_offset value: +-----------------------------------------------------+ | high_addr | 0 | +-----------------------------------------------------+ - ROM pages are like RAM pages, but have mem_type=IO_MEM_ROM. QEMU ensures that writing to such a page is a no-op, except on some target architectures, like Sparc, this may cause a CPU fault. ROM page phys_offset value: +-----------------------------------------------------+ | high_addr | IO_MEM_ROM | +-----------------------------------------------------+ - Dirty RAM page tracking is implemented by using special phys_offset values with mem_type=IO_MEM_NOTDIRTY. Note that these values do not appear directly in the physical page map, but in the CPU TLB cache (explained later). non-dirty RAM page phys_offset value (CPU TLB cache only): +-----------------------------------------------------+ | high_addr | IO_MEM_NOTDIRTY | +-----------------------------------------------------+ - Other pages are I/O pages, and their high_addr value will be 0 / ignored: I/O page phys_offset value: +----------------------------------------------------------+ | 0 | mem_type_index | flags | +----------------------------------------------------------+ Note that when reading from or writing to I/O pages, the lowest PAGE_BITS bits of the corresponding hwaddr value will be added to the page's |region_offset| value. This new address is passed to the read/write callback as the 'i/o address' for the operation. - As a special exception, if the I/O page's IO_MEM_ROMD flag is set, then high_addr is not 0, but the high bits of the corresponding ram_addr_t backing the page's contents on reads. On write operations though, the I/O region type's write callback will be called instead. ROMD I/O page phys_offset value: +----------------------------------------------------------+ | high_addr | mem_type_index | flags | +----------------------------------------------------------+ Note that |region_offset| is ignored when reading from such pages, it's only used when writing to the I/O page.