## Preliminaries ### File type The assembler will automatically recognize files with the extension `jbc` (Java ByteCode) as files to parse and assemble. Files with the extension `class` will be disassembled to `jbc` files. ### Notation Tokens are defined in this document using the `token := ...` notation. Tokens are written in _italic_, literals use the normal formatting. Regex-like operations, such as `(` and `)` for groups, `*` for 0 or more, and `?` for 0 or 1 are also be used in the documentation. ### Comments Standard Java syntax comments are possible: `//` for single-line comments and `/* */` for multi-line comments. ### Basic token types In essence, the Assembler can distinguish 4 different token types (based on the `StreamTokenizer`) tokens: * `number`: any sequence of `0-9`, starting with `-` for negative numbers, and containing a single `.` for non-integer numbers. * `word`: any sequence of `-`, `.`, `0-9`, `A-Z`, `a-z`, and all characters with a value greater than, or equal to 240 but less than, or equal to 255. A `word` must not start with a `number`. * `string`: any sequence of characters surrounded by double quotes (`"`) * `string` can contain escaped characters: * `\a` for the bell character * `\b` for the backspace character * `\f` for the new page character * `\n` for the new line character * `\r` for the carriage return character * `\t` for the horizontal tab character * `\v` for the vertical tab character * Additionally `string` can contain octal-escaped characters: `\xxx` where x is a `0-7` digit (up to `\377`). * `character`: a single character, surrounded by single quotes (`'`) * `character` follows the same escape rules as `string`, e.g. `'\n'` and `'\177'` are valid characters. ### Types Types generally follow the Java syntax, albeit less restrictive: any word can be a type, and any type can be succeeded by `[]` to denote an array. Method arguments also follow Java, with the important distinction that no argument names are specified.
type := word type := type [] methodArguments := ( ) methodArguments := ( (type ,)* type )### Access Flags In most cases, every Java bytecode access flag can be combined, even if these combinations would be meaningless, or illegal for the JVM. An exception to this are some class access flags, which are conveniently expressed as class types rather than access flags.
classAccessFlag := public classAccessFlag := private classAccessFlag := protected classAccessFlag := static classAccessFlag := final classAccessFlag := super classAccessFlag := synchronized classAccessFlag := volatile classAccessFlag := transient classAccessFlag := bridge classAccessFlag := varargs classAccessFlag := native classAccessFlag := abstract classAccessFlag := strictfp classAccessFlag := synthetic classAccessFlag := mandated classAccessFlag := open classAccessFlag := transitive classAccessFlag := static_phase
accessFlag := classAccessFlag accessFlag := module accessFlag := enum accessFlag := interface accessFlag := annotation## Class files
classFile := import* version class import := import type ; version := version number ;As in Java, it's possible to import classes at the top of the file. Only fully qualified class names are allowed, no wildcard or static imports are supported. Every file should also declare the Java version to assemble for. Valid versions are: 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 5. 1.6, 6, 1.7, 7, 1.8, 8, 1.9, 9, 10, 11, 12, and 13. Example: ``` import java.lang.String; import java.lang.System; import java.io.PrintStream; version 12; public class MyClass { public static void main(final String[] args) { getstatic System#PrintStream out ldc "Hello World!" invokevirtual PrintStream#void println(String) return } } ``` ## Classes
class := classAccessFlag* classType word superClassSpecifier classInterfacesSpecifier attributes? classBody
class := classAccessFlag* interfaceType word interfaceInterfacesSpecifier attributes? classBody
classType := class
classType := enum
classType := module
interfaceType := interface
interfaceType := @interface
superClassSpecifier := extends word
classInterfacesSpecifier := implements (word ,)* word
interfaceInterfacesSpecifier := extends (word ,)* word
classBody := ;
classBody := { classMember* }
Classes are defined analogously to Java: classes and enums can extend superclasses and implement interfaces.
Even though the syntax also allows modules to extend a superclass and implement interfaces, this is illegal for the JVM.
Interfaces and annotations (`@interface`) use the `extends` keywords as syntactic sugar to implement interfaces.
Additionally, classes extend `java.lang.Object` by default, emums extend `java.lang.Enum` by default, and annotations implement `java.lang.annotation.Annotation` by default.
Classes can optionally declare attributes and class members.
Examples:
```
public class MyException extends RuntimeException { // No attributes
// No fields
// No methods
}
public enum MyEnum; // Extends java.lang.Enum by default, no attributes, no fields, no methods
public @interface MyAnnotatation [ // Implements java.lang.annotation.Annotation by default
Synthetic;
Deprecated;
]; // No fields, no methods
public module module-info [ // Does not extend java.lang.Object by default
// No attributes
] {
// No fields, no methods
}
public class MyClass; // Extends java.lang.Object by default, no attributes, no fields, no methods
```
## Class members
classMember := field
classMember := method
field := accessFlag* type word (= fieldConstant)? attributes? ;
method := accessFlag* attributes? methodBody
method := accessFlag* type methodName methodArgumentsDefinition methodThrows? attributes? methodBody
methodBody := ;
methodBody := { instruction* }
methodName := word
methodName := <init>
methodName := <clinit>
methodThrows := throws (type ,)* type
methodArgumentsDefinition := ( )
methodArgumentsDefinition := ( (methodArgumentDefinition ,)* methodArgumentDefinition )
methodArgumentDefinition := accessFlag* type word?
Fields and methods are also defined analogously to Java.
Fields can be initialized using the equals sign (`=`), which will set the ConstantValue attribute.
Although this syntax is always valid, this initialization is only legal for the JVM if the field has `static` and `final` access flags.
The loadable constant type does not have to match the field type, indeed, some combinations are perfectly valid: a `boolean` field can be initialized using an `intConstant`.
Fields can also optionally declare attributes.
Examples:
```
public static final int INT_FIELD = 0; // No attributes
public static final boolean BOOLEAN_FIELD = 1; // The loadable constant type does not have to match the field type.
java.lang.String myStringField [
Deprecated;
Synthetic;
];
protected transient char someChar = 'a' []; // This is illegal for the JVM, as the field is not static and final.
```
A method can be declared in _class initializer_ format or regular Java method format.
Method arguments can contain access flags and method names, and the method can have a `throws` clause.
Methods can also optionally have a method attribute, which will set the Code attribute, and declare other attributes.
Examples:
```
static { // No attributes
return
} // Class initializer
public static final void main(synthetic final java.lang.String[] args) throws java.lang.Throwable [
Deprecated;
Synthetic;
] {
new java.lang.Exception
dup
invokespecial java.lang.Exception#void fieldReference := type # type word fieldReference := # type word methodReference := type # type word methodArguments methodReference := # type word methodArguments invokedynamicReference := number type word methodArguments methodHandle := getstatic fieldReference methodHandle := putstatic fieldReference methodHandle := getfield fieldReference methodHandle := putfield fieldReference methodHandle := invokevirtual methodReference methodHandle := invokestatic methodReference methodHandle := invokespecial methodReference methodHandle := newinvokespecial methodReference methodHandle := invokeinterface methodReferenceIf the first `type` of `fieldReference` or `methodReference` is not supplied, the type will be the type of the current class being assembled. In other words, these notations are shorthand for accessing fields or invoking methods of the current class. `invokedynamicReference` has 4 arguments: the index in the BootstrapMethods attribute, the return type of the method, the name of the method, and the method arguments. Because the Assembler has to know which constant to assign a value to, there are multiple notations for most constants. Some constants have a defining format, for example in the case of `booleanConstant`, however it's always possible to explicitly provide the type of the constant, in Java 'cast' format.
boolean := true boolean := false doubleLiteralSuffix := D doubleLiteralSuffix := d floatLiteralSuffix := F floatLiteralSuffix := f longLiteralSuffix := L longLiteralSuffix := l booleanConstant := boolean booleanConstant := (boolean) boolean booleanConstant := (boolean) number byteConstant := (byte) number charConstant := character charConstant := (char) character charConstant := (char) number doubleConstant := number doubleLiteralSuffix doubleConstant := (double) number doubleLiteralSuffix doubleConstant := (double) number floatConstant := number floatLiteralSuffix floatConstant := (float) number floatLiteralSuffix floatConstant := (float) number intConstant := number intConstant := (int) number longConstant := number longLiteralSuffix longConstant := (long) number longLiteralSuffix longConstant := (long) number shortConstant := (short) number stringConstant := string stringConstant := (String) string classConstant := type classConstant := (Class) type methodHandleConstant := (MethodHandle) methodHandle methodTypeConstant := (MethodType) type methodArguments dynamicConstant := (Dynamic) number type word fieldConstant := booleanConstant fieldConstant := byteConstant fieldConstant := charConstant fieldConstant := doubleConstant fieldConstant := floatConstant fieldConstant := intConstant fieldConstant := longConstant fieldConstant := shortConstant fieldConstant := stringConstant loadableConstant := fieldConstant loadableConstant := classConstant loadableConstant := methodHandleConstant loadableConstant := methodTypeConstant loadableConstant := dynamicConstant`booleanConstant`, `byteConstant`, `charConstant`, `intConstant`, and `shortConstant` are all converted to integer constants by the Assembler. This means that, in most cases, those constants are indistinguishable in the compiled class file. `dynamicConstant` has 3 arguments: the index in the BootstrapMethods attribute, the type of the constant and the name of the constant. ## Attributes
attributes := [ attribute* ]Some attributes are not explicitly parsed by the Assembler, but handled in a special way: * ConstantValue: assignment similar to Java (see section _Fields_) * MethodParameters: parameter access flags and names similar to Java (see section _Methods_) * Exceptions: methods throw exceptions similar to Java (see section _Methods_) * StackMap and StackMapTable: code is preverified by the ProGuard preverifier and these attributes are generated automatically. * LineNumberTable, LocalVariableTable, and LocalVariableTypeTable: using pseudo-instructions in the code (see subsection _Code attribute_) These attributes can not be defined explicitly, and will not be printed explicitly by the Disassembler ### BootstrapMethods attribute
attribute := BootstrapMethods { bootstrapMethod* }
bootstrapMethod := methodHandle { bootstrapMethodArgument* }
bootstrapMethodArgument := loadableConstant ;
Example:
```
BootstrapMethods {
invokestatic java.lang.invoke.StringConcatFactory#java.lang.invoke.CallSite makeConcatWithConstants(java.lang.invoke.MethodHandles$Lookup, java.lang.String, java.lang.invoke.MethodType, java.lang.String, java.lang.Object[]) {
"abc \001 def";
}
}
```
### SourceFile attribute
attribute := SourceFile string ;Example: `SourceFile "Assembler.java";` ### SourceDir attribute
attribute := SourceDir string ;Example: `SourceDir "My Source Directory";` ### InnerClasses attribute
attribute := InnerClasses { innerClass* }
innerClass := classAccessFlag* innerClassType innerName? outerClass? ;
innerClassType := classType
innerClassType := interfaceType
innerName := as word
outerClass := in type
Both `innerName` and `outerClass` are optional. Note that even though _module_ is a valid class type, it has no valid meaning in inner classes in Java bytecode.
Example:
```
InnerClasses {
public class InnerClass as InnerName in OuterClass;
public static @interface InnerAnnotation as Annotation;
public enum InnerEnum in EnclosingClass;
private module InnerModule;
}
```
### EnclosingMethod attribute
attribute := EnclosingMethod enclosingClass enclosingMethod? ; enclosingClass := type enclosingMethod := # type word methodArgumentsAlthough the enclosing class always has to be specified, `enclosingMethod` is optional. Example: ``` EnclosingMethod EnclosingClass # void enclosingMethod(java.lang.String, java.lang.Object); EnclosingMethod AnotherEnclosingClass; ``` ### NestHost attribute
attribute := NestHost type ;Example: ``` NestHost java.lang.Class; ``` ### NestMembers attribute
attribute := NestMembers { nestMember* }
nestMember := type ;
Example:
```
NestMembers {
java.lang.Class;
java.lang.String;
}
```
### Deprecated attribute
attribute := Deprecated ;### Synthetic attribute
attribute := Synthetic ;### Signature attribute
attribute := Signature string ;Example: ``` Signature "Ljava/lang/Enum
attribute := Code { instruction* } attributes?
#### Instructions
instruction := nop
instruction := aconst_null
instruction := iconst_m1
instruction := iconst_0
instruction := iconst_1
instruction := iconst_2
instruction := iconst_3
instruction := iconst_4
instruction := iconst_5
instruction := lconst_0
instruction := lconst_1
instruction := fconst_0
instruction := fconst_1
instruction := fconst_2
instruction := dconst_0
instruction := dconst_1
instruction := bipush number
instruction := sipush number
instruction := ldc loadableConstant
instruction := ldc_w loadableConstant
instruction := ldc2_w loadableConstant
instruction := iload number
instruction := lload number
instruction := fload number
instruction := dload number
instruction := aload number
instruction := iload_0
instruction := iload_1
instruction := iload_2
instruction := iload_3
instruction := lload_0
instruction := lload_1
instruction := lload_2
instruction := lload_3
instruction := fload_0
instruction := fload_1
instruction := fload_2
instruction := fload_3
instruction := dload_0
instruction := dload_1
instruction := dload_2
instruction := dload_3
instruction := aload_0
instruction := aload_1
instruction := aload_2
instruction := aload_3
instruction := iaload
instruction := laload
instruction := faload
instruction := daload
instruction := aaload
instruction := baload
instruction := caload
instruction := saload
instruction := istore number
instruction := lstore number
instruction := fstore number
instruction := dstore number
instruction := astore number
instruction := istore_0
instruction := istore_1
instruction := istore_2
instruction := istore_3
instruction := lstore_0
instruction := lstore_1
instruction := lstore_2
instruction := lstore_3
instruction := fstore_0
instruction := fstore_1
instruction := fstore_2
instruction := fstore_3
instruction := dstore_0
instruction := dstore_1
instruction := dstore_2
instruction := dstore_3
instruction := astore_0
instruction := astore_1
instruction := astore_2
instruction := astore_3
instruction := iastore
instruction := lastore
instruction := fastore
instruction := dastore
instruction := aastore
instruction := bastore
instruction := castore
instruction := sastore
instruction := pop
instruction := pop2
instruction := dup
instruction := dup_x1
instruction := dup_x2
instruction := dup2
instruction := dup2_x1
instruction := dup2_x2
instruction := swap
instruction := iadd
instruction := ladd
instruction := fadd
instruction := dadd
instruction := isub
instruction := lsub
instruction := fsub
instruction := dsub
instruction := imul
instruction := lmul
instruction := fmul
instruction := dmul
instruction := idiv
instruction := ldiv
instruction := fdiv
instruction := ddiv
instruction := irem
instruction := lrem
instruction := frem
instruction := drem
instruction := ineg
instruction := lneg
instruction := fneg
instruction := dneg
instruction := ishl
instruction := lshl
instruction := ishr
instruction := lshr
instruction := iushr
instruction := lushr
instruction := iand
instruction := land
instruction := ior
instruction := lor
instruction := ixor
instruction := lxor
instruction := iinc number number
instruction := i2l
instruction := i2f
instruction := i2d
instruction := l2i
instruction := l2f
instruction := l2d
instruction := f2i
instruction := f2l
instruction := f2d
instruction := d2i
instruction := d2l
instruction := d2f
instruction := i2b
instruction := i2c
instruction := i2s
instruction := lcmp
instruction := fcmpl
instruction := fcmpg
instruction := dcmpl
instruction := dcmpg
instruction := ifeq label
instruction := ifne label
instruction := iflt label
instruction := ifge label
instruction := ifgt label
instruction := ifle label
instruction := if_icmpeq label
instruction := if_icmpne label
instruction := if_icmplt label
instruction := if_icmpge label
instruction := if_icmpgt label
instruction := if_icmple label
instruction := if_acmpeq label
instruction := if_acmpne label
instruction := goto label
instruction := jsr label
instruction := ret number
instruction := tableswitch { switchCase* }
instruction := lookupswitch { switchCase* }
instruction := ireturn
instruction := lreturn
instruction := freturn
instruction := dreturn
instruction := areturn
instruction := return
instruction := getstatic fieldReference
instruction := putstatic fieldReference
instruction := getfield fieldReference
instruction := putfield fieldReference
instruction := invokevirtual methodReference
instruction := invokespecial methodReference
instruction := invokestatic methodReference
instruction := invokeinterface methodReference
instruction := invokedynamic invokedynamicReference
instruction := new type
instruction := newarray type
instruction := anewarray type
instruction := arraylength
instruction := athrow
instruction := checkcast type
instruction := instanceof type
instruction := monitorenter
instruction := monitorexit
instruction := multianewarray type number
instruction := ifnull label
instruction := ifnonnull label
instruction := goto_w label
instruction := jsr_w label
switchCase := case number : label
switchCase := default : label
Note that the `wide` instruction is not present, this instruction is replaced by the pseudo-instructions:
instruction := iload_w number instruction := lload_w number instruction := fload_w number instruction := dload_w number instruction := aload_w number instruction := istore_w number instruction := lstore_w number instruction := fstore_w number instruction := dstore_w number instruction := astore_w number instruction := iinc_w number number instruction := ret_w numberFurthermore, pseudo-instructions exist for labels, try-catch blocks, local variables, local variable types, and line numbers:
instruction := label : instruction := catch type label label instruction := catch any label label instruction := startlocalvar number type word instruction := endlocalvar number instruction := startlocalvartype number string word instruction := endlocalvartype number instruction := line number label := wordA `catch` pseudo-instruction specifies an exception handler at the location of the pseudo-instruction. The catch type, start, end, and handler will be added to the exception table in the Code attribute. `startlocalvar` and `startlocalvartype`, `endlocalvar` and `endlocalvartype`, specify the start or end of a local variable or local variable type, respectively. These pseudo-instructions modify the LocalVariableTable or LocalVariableTypeTable attributes in the Code attribute. The `number` defines the index of the local variable or local variable type. A `startlocalvar` and `startlocalvartype` must always have an accompanying `endlocalvar` or `endlocalvartype`, placed after the `startlocalvar` or `startlocalvartype` in the instructions. `line` specifies the line `number` at a position in the bytecode. The line `number` and bytecode offset will be stored in a LineNumberTable attribute. ### Annotations attributes
attribute := RuntimeVisibleAnnotations { annotation* }
attribute := RuntimeInvisibleAnnotations { annotation* }
attribute := RuntimeVisibleParameterAnnotations { parameterAnnotation* }
attribute := RuntimeInvisibleParameterAnnotations { parameterAnnotation* }
attribute := RuntimeVisibleTypeAnnotations { typeAnnotation* }
attribute := RuntimeInvisibleTypeAnnotations { typeAnnotation* }
attribute := AnnotationDefault elementValue
annotation := type { (word = elementValue)* }
parameterAnnotation := { annotation* }
typeAnnotation := annotation targetInfo { typePath* }
Examples:
```
RuntimeVisibleAnnotations {
java.lang.Deprecated {
since = "sinceVersion";
forRemoval = true;
}
}
RuntimeInvisibleAnnotations {
java.lang.Deprecated {} // Empty values
}
RuntimeVisibleParameterAnnotations {
{} // Empty annotations for parameter 0
{
java.lang.Deprecated {
since = "sinceVersion";
forRemoval = true;
}
}
}
RuntimeInvisibleParameterAnnotations {
{
java.lang.Deprecated {} // Empty values
}
{} // Empty annotations for parameter 1
{} // Empty annotations for parameter 2
{} // Empty annotations for parameter 3
}
RuntimeVisibleTypeAnnotations {
java.lang.Deprecated {
since = "sinceVersion";
forRemoval = true;
} local_variable {
start0 end0 0;
start10 end10 10;
} {} // Empty type path
}
RuntimeVisibleTypeAnnotations {
java.lang.Deprecated {} argument_generic_method_new newLabel 1 {
array;
type_argument 1;
}
}
RuntimeInvisibleTypeAnnotations {
java.lang.Deprecated {} field {} // Empty values, empty type path
}
AnnotationDefault {
false; // Boolean element value
true; // Boolean element value
(byte) 1; // Byte element value
'2'; // Char element value
3.0D; // Double element value
4F; // Float element value
5; // Int element value
6l; // Long element value
(short) 7; // Short element value
"string"; // String element value
java.lang.Class; // Class element value
Enum#Constant; // Enum constant element value
@java.lang.Deprecated {} // Annotation element value
{} // Array element value
} // Array element value
```
#### Element values
elementValue := booleanConstant ;
elementValue := byteConstant ;
elementValue := charConstant ;
elementValue := doubleConstant ;
elementValue := floatConstant ;
elementValue := intConstant ;
elementValue := longConstant ;
elementValue := shortConstant ;
elementValue := stringConstant ;
elementValue := classConstant ;
elementValue := (Enum) type # word ;
elementValue := type # word ;
elementValue := (Annotation) annotation
elementValue := @ annotation
elementValue := (Array) { elementValue* }
elementValue := { elementValue* }
Apart from the usual primitive constants, string constants, and class constants, element values can also denote enum constants (enum type + constant name), annotations and arrays.
Note that annotation element values and array element values do not end with a `;`, as they already (either implicitly or explicitly) end with a `}`.
#### Target infos
targetInfo := parameter_generic_class number
targetInfo := parameter_generic_method number
targetInfo := extends number
targetInfo := bound_generic_class number number
targetInfo := bound_generic_method number number
targetInfo := field
targetInfo := return
targetInfo := receiver
targetInfo := parameter number
targetInfo := throws number
targetInfo := local_variable { localVar* }
targetInfo := resource_variable { localVar* }
targetInfo := catch number
targetInfo := instance_of label
targetInfo := new label
targetInfo := method_reference_new label
targetInfo := method_reference label
targetInfo := cast label number
targetInfo := argument_generic_method_new label number
targetInfo := argument_generic_method label number
targetInfo := argument_generic_method_reference_new label number
targetInfo := argument_generic_method_reference label number
localVar := label label number ;
In general, the arguments of the target infos roughly match the ones specified in the Class File Format specification.
#### Type path
typePath := array number? ; typePath := inner_type number? ; typePath := wildcard number? ; typePath := type_argument number? ;Although every type path has an optional `number` argument, this argument only has meaning in combination with `type_argument`. In that case, the `number` denotes which type argument is annotated (see the Class File Format specification for more details). ### Module attribute
attribute := Module accessFlag* word word? { moduleDirective* }
moduleDirective := requires accessFlag* word word? ;
moduleDirective := exports accessFlag* type exportsTo? ;
moduleDirective := opens accessFlag* type opensTo? ;
moduleDirective := uses type ;
moduleDirective := provides type providesWith? ;
exportsTo := to (word ,)* word
opensTo := to (word ,)* word
providesWith := with (type ,)* type
The module attribute specifies the module access flags, the module name, and an optional module version.
As the module version must be a `word`, it can not start with a `number`.
_exports_, _opens_, and _provides_ all have optional arguments specifying the directive.
These arguments use the same syntax as their Java counterparts.
Example:
```
Module open synthetic mandated ModuleName v1.0 {
requires transitive some.package.RequiredModule v1.0;
requires static_phase some.package.OtherRequiredModule;
requires synthetic some.package.SyntheticRequiredModule alpha;
requires mandated some.package.MandatedRequiredModule beta;
exports synthetic some.package.exportedpackage;
exports mandated some.package.mandated.exportedpackage to some.package.export.to.package, some.package.export.to.otherpackage, some.package.export.to.finalpackage;
opens synthetic some.package.openedpackage;
opens mandated some.package.mandated.openedpackage to some.package.open.to.package, some.package.open.to.otherpackage, some.package.open.to.finalpackage;
uses some.package.UsedClass;
uses some.package.OtherUsedClass;
uses some.package.MoreUsedClass;
uses some.package.FinalUsedClass;
provides some.package.ProvidedClass;
provides some.package.OtherProvidedClass with some.package.OtherProvidedClassImpl, some.package.OtherProvidedClassImpl1;
provides some.package.FinalProvidedClass;
}
```
### ModuleMainClass attribute
attribute := ModuleMainClass type ;Example: ``` ModuleMainClass some.package.ModuleMainClass; ``` ### ModulePackages attribute
attribute := ModulePackages { type* }
Example:
```
ModulePackages {
some.package;
some.other.package;
}
```