x86:LE:32:default:gcc
and associating it with the file x86gcc.cspec
<compiler_spec>
as the root XML tag. All
specific compiler features are described using subtags to this tag. In
principle, all the subtags are optional except
the <default_prototype>
tag, but there is generally a
minimum set of tags that are needed to create a useful specification
(See ???). In general, the subtags can appear in any order. The only
exceptions are that tags which define names,
like <prototype>
, must appear before other tags
which use that name.
<register>
tag is used to specify formally named registers, usually defined by
the SLEIGH specification for the processor. The name must be given in a <varnode>
tag is used to generically describe any varnode. It must take
three attributes:
<varnode>
tag can be used to describe any varnode, including named registers, global
RAM locations, and stack locations. For stack locations, the offset is interpreted relative to the
function that is being decompiled or is otherwise in scope. An offset of 0, for instance typically refers
to the memory location on the stack being pointed to by the formal stack pointer register, upon entry
to the function being analyzed.
<context_set> |
(0 or more) Set a context variable across a region of memory | |
<tracked_set> |
(0 or more) Set default value of register |
<context_data>
tag consists of zero or more <context_set>
and <tracked_set>
subtags, which allow certain values to be assumed by analysis.
space |
Name of address space | |
first |
(Optional) Starting offset of range | |
last |
(Optional) Ending offset of range | |
<set> |
Specify the context variable and the new value | |
name |
Name of the context variable | |
val |
Integer value being set | |
description |
(Optional) Description of what is set |
<context_set>
tag sets a SLEIGH context variable over a specified address range.
This potentially affects how instructions are disassembled within that range. This is more
commonly used in the space
, first
, and last
describe the range.
Omitting first
and/or last
causes the range to start at the beginning
and/or run to the end of the address space respectively.
The <set>
subtag describes the variable and its setting.
space |
Name of address space | |
first |
(Optional) Starting offset of range | |
last |
(Optional) Ending offset of range | |
<set> |
Specify the register and the new value | |
name |
Name of the register | |
val |
Integer value being set | |
description |
(Optional) Description of what is set |
<tracked_set>
tag informs the decompiler that a register takes a specific value
for any function whose entry point is in the indicated range. Compilers sometimes know or assume that
registers have specific values coming into a function it produces. This tag allows the decompiler to
make the same assumption and possibly use constant propagation to make further simplifications.
name |
The identifier for this callfixup | |
<target> |
(0 or more) Map this callfixup to a specific symbol | |
name |
The specific symbol name | |
<pcode> |
Description of p-code to inject. |
paramshift |
(Optional) Integer for shifting parameters at the callpoint. | |
<body> |
P-code to inject. | |
|
name
attribute can be used to identify the callfixup
within the Ghidra CodeBrowser and manually force certain functions to
be replaced. The name
attribute of
the <callfixup>
tag and any
optional <target>
subtags identify function names
which will <body>
subtag is fed directly to
the SLEIGH semantic expression parser to create the p-code snippet.
Identifiers are interpreted as formal registers, if the register exists,
but are otherwise interpreted as temporary registers in the targetop |
Name of the |
|
<pcode> |
Description of p-code to inject. |
<input> |
(0 or more) Description of formal input parameter. | |
name |
Name of the specific input symbol. | |
size |
Expected size of the parameter in bytes. | |
<output> |
(0 or more) Description of formal output parameter. | |
name |
Name of the specific output symbol. | |
size |
Expected size of output in bytes. | |
<body> |
P-code to inject. | |
|
<callotherfixup>
is similar to a <callfixup>
tag but is used to describe
injections that replace user-defined p-code operations, rather than CALL
operations. User-defined
p-code operations, referred to generically as CALLOTHER
operations, are targetop
attribute links the p-code described here to the specific operation via this name.
CALLOTHER
takes formal varnodes as inputs and/or outputs. These varnodes can be referred to
in the injection <body>
by predefining them using <input>
or
<output>
tags. The sequence of <input>
tags correspond in order to the
input parameters of the CALLOTHER
, and a <output>
tag corresponds to output varnode
if present. The tags listed here size
attribute in each tag will, if present, impose a size restriction on the parameter as well.
<callfixup>
, the <body>
tag is fed straight to the SLEIGH semantic
parser. It can refer to registers via their symbolic name defined in SLEIGH, it can refer to the operator parameters
via their <input>
or <output>
names, and it can also refer to
inst_start
and inst_next
as addresses describing the instruction containing the
CALLOTHER
.
style | Strategy for splitting: |
|
(1 or more) |
signext | (Optional) |
<absolute_max_alignment> |
(Optional) Maximum alignment possible across all datatypes (0 indicates no maximum) | |
value |
||
<machine_alignment> |
(Optional) Maximum useful alignment for the underlying architecture | |
value |
||
<default_alignment> |
(Optional) Default alignment for any datatype that isn't structure, union, array, or pointer and whose size isn't in the size/alignment map | |
value |
||
<default_pointer_alignment> |
(Optional) Default alignment for a pointer that doesn't have a size | |
value |
||
<pointer_size> |
(Optional) Size of a pointer | |
value |
||
<pointer_shift> |
(Optional) Left-shift amount, in bits, for shifted pointer datatypes | |
value |
||
<wchar_size> |
(Optional) Size of "wchar", the wide character datatype | |
value |
||
<short_size> |
(Optional) Size of "short" and other short integer datatypes | |
value |
||
<integer_size> |
(Optional) Size of "int" and other integer datatypes | |
value |
||
<long_size> |
(Optional) Size of "long" and other long integer datatypes | |
value |
||
<long_long_size> |
(Optional) Size of "longlong" integer datatypes | |
value |
||
<float_size> |
(Optional) Size of "float" and other floating-point datatypes | |
value |
||
<double_size> |
(Optional) Size of "double" and other double precision floating-point datatypes | |
value |
||
<long_double_size> |
(Optional) Size of "longdouble" floating-point datatypes | |
value |
||
<size_alignment_map> |
(Optional) Size to alignment map |
<data_organization>
tag provides information
about the sizes of core datatypes and how the compiler typically
aligns datatypes. These are required so analysis can determine the
proper in-memory layout of datatypes, such as those described by C/C++
style header files. Both sizes and alignments are specified
in bytes by using the integer value
attribute in the
corresponding tag. An alignment value indicates that the compiler
chooses a byte address that is a multiple of that value as the start
of that datatype. A value of 1 indicates <size_alignment_map>
. If the size of a particular datatype
isn't listed in the map, the <default_alignment>
value
will be used.
<entry> |
(0 or more) Alignment information for a particular size | |
size |
Size of datatype in bytes | |
alignment |
The alignment value |
<entry>
maps a specific size to a specific alignment. Ghidra satisfies requests
for the alignment of all atomic datatypes (except pointers) by consulting this map. If it doesn't
contain the particular size, Ghidra reverts to the <default_alignment>
subtag
in the parent <data_organization>
tag. Its typical to only provide alignments
for sizes which are a power of 2.
size |
Default size of an enumerated datatype | |
signed |
(Optional) |
align |
Number of alignment bytes for functions |
align
attribute should always be a power of 2 corresponding to the number of bits
a compiler might use for additional storage.
<register> |
(0 or more) Specific register to be marked as global | |
name |
Name of register | |
<range> |
(0 or more) Range of addresses to be marked as global | |
space |
Address space of the global region | |
first |
(Optional) Starting offset of the region | |
last |
(Optional) Ending offset of the region |
<global>
tag marks specific memory regions as
storage locations for the compiler's global variables. The
word <register> |
(0 or more) Specific register to be marked as read-only | |
name |
Name of register | |
<range> |
(0 or more) Range of addresses to be marked as read-only | |
space |
Address space of the read-only region | |
first |
(Optional) Starting offset of the region | |
last |
(Optional) Ending offset of the region |
<readonly>
tag labels a specific region as
read-only. From the point of view of the compiler, these memory
locations hold constant values. This allows the decompiler to
propagate these constants and potentially perform additional simplification.
This tag is not very common because most read-only memory sections are determined
dynamically from the executable header.
<register> |
(0 or more) Specific register to be marked as not addressable | |
name |
Name of register | |
<range> |
(0 or more) Range of addresses to be marked as not addressable | |
space |
Address space of the unaddressable region | |
first |
(Optional) Starting offset of the region | |
last |
(Optional) Ending offset of the region |
<nohighptr>
tag describes a memory region into
which the compiler does not expect to see pointers from any high-level
source code. This is slightly different from saying that there are
absolutely no indirect references into the region. This tag is really
intended to partly address the modeling of register |
Name of register to use as stack pointer | |
space |
Address space that will hold the |
|
growth |
(Optional) |
|
reversejustify |
(Optional) |
<stackpointer>
tag informs Ghidra of the main
stack mechanism for the compiler. The register
attribute
gives the name of the register that holds the current offset into the
stack, and the space
attribute specifies the name of the
address space that holds the actual data. This tag triggers the
creation of a formal growth
attribute to One |
<returnaddress>
tag can help by making the standard storage location explicit.
<prototype>
tag encodes details about a specific prototype model, within a compiler
specification. A given compiler spec
can have multiple prototype models, which are all distinguished by the mandatory <prototype>
tags must include the subtags,
<input>
and <output>
, which list storage locations
(registers, stack, and other varnodes) as
the raw material for the prototype model to decide where parameters are stored for passing
between functions. The <input>
tag holds the resources used to pass input parameters, and
<output>
describes resources for return value storage. A resource is described by
the <pentry>
tag, which comes in two flavors. Most <pentry>
tags describe a storage location to be used by a single variable. If the tag has an
<pentry>
resources are used is
determined by the prototype model's <prototype>
tag. There are currently only two strategies:
<pentry>
subtags under the
<input>
tag are viewed as an ordered resource list.
When assigning storage locations from a list of datatypes, each datatype is evaluated
in order. The first <pentry>
from the resource list that fits the datatype and hasn't
been fully used by previous datatypes is assigned to that datatype.
In this case, the <input>
tag
lists varnodes in the order that a compiler would dole them out when given a list of parameters to
pass. Integer or pointer values are usually passed first in specially designated registers rather than on the
stack if there are not enough available registers. There can one stack-based
<pentry>
at the end of the list that will typically match any number of
parameters of any size or type.
<pentry>
tags for dedicated floating-point registers,
the standard strategy treats them as a separate resource list, independent of the one for
integer and pointer datatypes.
The <pentry>
tags specifying floating-point registers are listed in the same
<input>
tag, immediately after the integer registers, and are distinguished by
the metatype="float"
attribute labeling the individual tags.
<pentry>
resource list. If there is a gap, i.e. the second
<pentry>
occurs as a varnode but not the first, then the decompiler
will fill in the gap by creating an extra <pentry>
tags for any register that might conceivably be considered an input
location. Then the input varnodes for a function that have a corresponding <pentry>
are automatically promoted to formal parameters. In practical terms, this strategy
behaves in the same way as the Standard strategy, except that in the reverse case,
the decompiler does not care about gaps in the resource list. It will not fill in
gaps, and it will not throw out putative inputs because of large gaps.<pentry>
that hasn't been used and that fits the
datatype is assigned. Note that this may not make as much sense for hand-coded assembly.
<prototype> |
Specification for the default prototype |
<default_proto>
tag, which contains exactly one
<prototype>
sub-tag. Other <prototype>
tags can be listed outside
of this tag. The designated default prototype model. Where users are given the option of choosing from
among different prototype models, the name "default" is always presented as an option and refers to this
prototype model. It is also used in some situations where the prototype model is unknown but analysis needs
to proceed.
name |
The name of the prototype model | |
extrapop |
Amount stack pointer changes across a call or |
|
stackshift |
Amount stack changes due to the call mechanism | |
type |
(Optional) Generic calling convention type: |
|
strategy |
(Optional) Allocation strategy: |
|
<input> |
Resources for input variables | |
pointermax |
(Optional) Max size of parameter before converting to pointer | |
thisbeforeretpointer |
(Optional) |
|
killedbycall |
(Optional) |
|
<pentry> |
(1 or more) Storage resources | |
<output> |
Resources for return value | |
killedbycall |
(Optional) |
|
<pentry> |
(1 or more) Storage resources | |
<returnaddress> |
(Optional) Storage location of return value | |
<unaffected> |
(Optional) Registers whose value is unaffected across calls | |
<killedbycall> |
(Optional) Registers whose value does not persist across calls | |
<likelytrash> |
(Optional) Registers that may hold a trash value entering the function | |
<localrange> |
(Optional) Range of stack locations that may hold mapped local variables |
<prototype>
tag specifies a prototype model. It must have a <prototype>
must specify the <input>
tag lists the resources used to pass input parameters to a function
with this prototype. The varnodes used for passing are selected by an
<input>
tag contains a list of <pentry>
sub-tags describing the varnodes.
Depending on the allocation strategy, the ordering is typically important.
<input>
should be considered as killed by call (See <pentry>
subtags within the <output>
tag is slightly different
than for the input case. Technically, this tag is sensitive to the <pentry>
within list that matches the data-type is used as the storage
location. If none of the <pentry>
storage locations fit the data-type, a
<pentry>
tag in the resource list. The varnode whose corresponding
tag occurs the earliest in the list becomes the formal return value for the function.
If an output varnode matches no <pentry>
, then it is rejected as a formal return value.
minsize |
Size (in bytes) of smallest variable stored here | |
maxsize |
Size (in bytes) of largest variable stored here | |
align |
(Optional) Alignment of successive locations within this entry | |
metatype |
(Optional) Restriction on datatype:
|
|
extension |
(Optional) How small values are extended: |
|
<register> |
Storage location of the entry | |
name |
Name of register | |
<addr> |
(alternate form) | |
space |
Address space of the location | |
offset |
Offset (in bytes) of location |
<pentry>
tag describes the individual memory resources that make up both
the <input>
and <output>
resource lists. These
are consumed by the allocation strategy as it assigns storage for parameters and return values.
Attributes describe restrictions on how a particular <pentry>
resource
can be used.
<register>
or the
<addr>
subtag. The minsize
and maxsize
attributes
restrict the size of the parameter to which the entry is assigned, and the metatype
attribute restricts the type of the parameter.
unknown
or no type restriction. The <metatype>
can
be used to split out a separate floating-point resource list for some allocation strategies.
In the <pentry>
that
has the attribute metatype="float"
is pulled out into a separate list from all the other entries.
extension
attribute indicates that variables are extended to fill the
entire location, if the datatype would otherwise occupy fewer bytes. The zero
for zero extension,
sign
for sign extension, and float
for floating-point extension.
A value of inttype
indicates the value is either sign or zero extended depending on
the original datatype. The default is none
for no extension.
align
attribute indicates that multiple variables can be drawn from the
pentry
resource. The first variable occupies bytes starting with the address
of the storage location specified in the tag. Additional variables start at the next available
aligned byte. The attribute value must be a positive integer that specifies the alignment. This
is typically used to model parameters pulled from a stack resource. The example below draws
up to 500 bytes of parameters from the stack, which are 4 byte aligned, starting at an offset
of 16 bytes from the initial value of the stack pointer.
One |
(1 or more) |
(1 or more) |
<unaffected>
tag which specifies that the value is unchanged across the call.
<unaffected>
or <killedbycall>
is treated as if it (1 or more) |
<range> |
(1 or more) Range of bytes eligible for local variables | |
space |
Address space containing range (Usually "stack") | |
first |
(Optional) Starting byte offset of range, default is 0 | |
last |
(Optional) Ending byte offset, default is maximal offset of space |
<range>
tags that explicitly describe
all the possible ranges on the stack that can hold mapped local variables other than
parameters. Individual functions will be assumed to use some subset of this region.
The <range>
tag give offsets relative to the incoming value
of the stack pointer. This affects the decompiler's reconstruction of the stack frame
for a function and parameter recovery.
<localrange>
tag replaces the default,
so it needs to specify the default range if it wants to keep it.