--- eip: 3670 title: EOF - Code Validation description: Validate EOF bytecode for correctness at the time of deployment. author: Alex Beregszaszi (@axic), Andrei Maiboroda (@gumb0), Paweł Bylica (@chfast) discussions-to: https://ethereum-magicians.org/t/eip-3670-eof-code-validation/6693 status: Stagnant type: Standards Track category: Core created: 2021-06-23 requires: 3540 --- ## Abstract Introduce code validation at contract creation time for EOF formatted ([EIP-3540](./eip-3540.md)) contracts. Reject contracts which contain truncated `PUSH`-data or undefined instructions. Legacy bytecode (code which is not EOF formatted) is unaffected by this change. ## Motivation Currently existing contracts require no validation of correctness and EVM implementations can decide how they handle truncated bytecode or undefined instructions. This change aims to bring code validity into consensus, so that it becomes easier to reason about bytecode. Moreover, EVM implementations may require fewer paths to decide which instruction is valid in the current execution context. If there's a desire to introduce new instructions without bumping the EOF version, having undefined instructions already deployed could potentially break such contracts, as some instructions might change their behavior. Rejecting to deploy undefined instructions allows introducing new instructions with or without bumping the EOF version. ### EOF1 forward compatibility The EOF1 format provides following forward compatibility properties: 1. New instructions can be defined for previously unassigned opcodes. These instructions may have immediate values. 2. Mandatory EOF sections may be made optional. 3. New optional EOF sections may be introduced. They can be placed in any order in relation to previously defined sections. ## Specification This feature is introduced on the same block EIP-3540 is enabled, therefore every EOF1-compatible bytecode MUST be validated according to these rules. 1. Previously deprecated instructions `CALLCODE` (0xf2) and `SELFDESTRUCT` (0xff), as well as instructions deprecated in EIP-3540, are invalid and their opcodes are undefined. (**NOTE** there are more instructions deprecated and rejected in EOF, as specced out by separate EIPs) 2. At contract creation time *code validation* is performed on each code section of the EOF container. The code is invalid if any of the checks below fails. For each instruction: 1. Check if the opcode is defined. The `INVALID` (0xfe) is considered defined. 2. Check if all instructions' immediate bytes are present in the code (code does not end in the middle of instruction). ## Rationale ### Immediate data Allowing implicit zero immediate data for `PUSH` instructions introduces inefficiencies to EVM implementations without any practical use-case (the value of a `PUSH` instruction at the code end cannot be observed by EVM). This EIP requires all immediate bytes to be explicitly present in the code. ### Rejection of deprecated instructions The deprecated instructions `CALLCODE` (0xf2) and `SELFDESTRUCT` (0xff) are removed from the `valid_opcodes` list to prevent their use in the future. ### BLOCKHASH instruction The `BLOCKHASH` instruction is well replaced by the system contract introduced in [EIP-2935](./eip-2935). However, despite a replacement being introduced this opcode has not been deprecated. This opcode will remain valid in EOF not to differentiate from legacy bytecode. ## Backwards Compatibility This change poses no risk to backwards compatibility, as it is introduced at the same time EIP-3540 is. The validation does not cover legacy bytecode (code which is not EOF formatted). ## Test Cases ### Contract creation Each case should be tested by submitting an EOF container to EOF contract creation (as specced out in a separate EIP). Each case should be tested with code placed in code sections at different indices. ### Valid codes - EOF code containing `INVALID` - EOF code with data section containing bytes that are undefined instructions ### Invalid codes - EOF code containing an undefined instruction - EOF code ending with incomplete `PUSH` instruction ## Reference Implementation ```python # The ranges below are as specified by Execution Specs for Shanghai. # Note: range(s, e) excludes e, hence the +1 shanghai_opcodes = [ *range(0x00, 0x0b + 1), *range(0x10, 0x1d + 1), 0x20, *range(0x30, 0x3f + 1), *range(0x40, 0x48 + 1), *range(0x50, 0x5b + 1), 0x5f, *range(0x60, 0x6f + 1), *range(0x70, 0x7f + 1), *range(0x80, 0x8f + 1), *range(0x90, 0x9f + 1), *range(0xa0, 0xa4 + 1), # Note: 0xfe is considered assigned. 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xfa, 0xfd, 0xfe, 0xff ] # Drop the opcodes deprecated and rejected in here and in EIP-3540 rejected_in_eof = [ 0x38, 0x39, 0x3b, 0x3c, 0x3f, 0x5a, 0xf1, 0xf2, 0xf4, 0xfa, 0xff ] valid_opcodes = [op for op in shanghai_opcodes not in rejected_in_eof] immediate_sizes = 256 * [0] immediate_sizes[0x60:0x7f + 1] = range(1, 32 + 1) # PUSH1..PUSH32 # Raises ValidationException on invalid code def validate_instructions(code: bytes): # Note that EOF1 already asserts this with the code section requirements assert len(code) > 0 pos = 0 while pos < len(code): # Ensure the opcode is valid opcode = code[pos] if opcode not in valid_opcodes: raise ValidationException("undefined opcode") # Skip immediate data pos += 1 + immediate_sizes[opcode] # Ensure last instruction's immediate doesn't go over code end if pos != len(code): raise ValidationException("truncated immediate") ``` ## Security Considerations See [Security Considerations of EIP-3540](./eip-3540.md#security-considerations). ## Copyright Copyright and related rights waived via [CC0](../LICENSE.md).