EVM
Last updated
Last updated
An account is an object in the world state.
An account is a mapping between an Address and Account state
Account state contains
nonce, balance, storage hash, code hash
EOA controlled by private key and cannot contain EVM code
Contract Account (CA) contains EVM code
You can determine if an address is a Solidity smart contract by checking the size of the code stored at the address.
Assembly extcodesize
is used in Solidity functions to determine the size of the code at a particular address. If the code size at the address is greater than 0 then the address is a smart contract.
Smart contracts prevent other SC from calling their functions by implementing an account code size check.
code size check determines if the address interacting with the contract contain code, and if it does the function is not executed
A transaction is a single cryptographically-signed instruction.
It can either be a
contraction creation
message call
If contract creation:
To field in a transaction is 0 (not specified)
Data is the init code fragment, including the contract binary.
init function is NOT stored on the blockchain as it is just used for setup (constructor)
If message call:
To field has to be some 160 bits (20 bytes) address
For EOA, the address is derived as the last 20 bytes of the public key controlling the account, e.g., cd2a3d9f938e13cd947ec05abc7fe734df8dd826
.
This is a hexadecimal format (base 16 notation), which is often indicated explicitly by appending 0x
to the address.
Since each byte of the address is represented by 2 hex characters, a prefixed address is 42 characters long.
Message comprises of Data (as a set of bytes) and Value (specified as Ether)
A message can be triggered by a transaction or by EVM code
A transaction is an atomic operation. Cannot divide or interrupt
It is either completed in entirety or nothing is done → no halfway, middling state.
Transactions cannot be overlapped, they must be executed sequentially.
Transaction order is not guaranteed.
Order of transactions in a block: can be determined by Miners
Order between blocks is determined by a consensus algo: like PoW
EVM code is executed on the EVM
The Ethereum Virtual Machine is the runtime environment for smart contracts in Ethereum
It is stack-based and does not have registers.
It uses big endian byte ordering for instructions, memory, and input data. Words in the EVM are 256 bits (32 bytes) wide.
💡 The word size is the maximum size that the majority of operations work with.
On stacks, you PUSH data onto the top of it, POP data off, and apply instructions like ADD or MULT to the first few values that lay on top of it.
The Stack - with maximum 1024 elements. Each element is 256 bits (1 word).
Memory (volatile memory) - byte addressed linear memory.
Storage (persistent memory) - key-value store of 256 bit to 256 bit pairs.
All operations are performed on the stack
Memory is like a big array (that’s the “linear” part).
It can be addressed (indexed into) at the byte level (return 8 bit values) or at the word level (return 256 bit values).
Memory is unlimited but constrained by gas fee requirements.
Account storage is more like a map, pairing 256 bit keys with 256 bit values (words to words). Each location in storage is 0 initialized.
Both the stack and memory are volatile (deleted after contract execution), but storage is persistent and stored in Ethereum’s world state (sticks around after execution).
Initially, all storage and memory are set to zero in the EVM.
The program code is stored in virtual read-only memory (virtual ROM) that is accessible using the CODECOPY instruction.
The CODECOPY instruction copies the program code into the main memory.
EVM code is the bytecode that the EVM can natively execute.
There exist several implementations of the EVM specification, including the one embedded in Geth. We’ll use Geth as it is one of the most popular Ethereum clients and handily supports an EVM command line utility for easy invocation.
$ git clone <https://github.com/ethereum/go-ethereum.git> $ cd go-ethereum $ make all
This, in addition to building the rest of Geth from source, will leave you with an evm executable in the build folder. Feel free to copy this executable to wherever you’d like to work.
Opcode reference:https://www.evm.codes/,https://ethervm.io/
Now that we have an EVM implementation setup, let’s write some code! First, create an empty easm file to edit and open it up in your favourite editor.
$ touch hello.easm
Let’s write some EVM assembly, which is a text-based human readable format that will be assembled into EVM bytecode for us by geth’s evm
binary.
run ./evm compile hello.easm
.
Should get the rather intractable looking string 600260020200
out.
This is your evm assembly assembled into evm bytecode (in hexadecimal format)!
Now go ahead and try ./evm --debug run hello.easm
The first value (0x
) is our return value. We have not explicitly called the RETURN opcode, so this is empty.
We can also see the execution trace of our bytecode, along with the program counter value (pc), current remaining gas, and gas cost of each instruction.
Additionally we can see a full view of the stack and every step of execution.
Notice that the stack is shown top to bottom (index 0 is the top most element).
From this trace we can see our bytecode first pushes the value “2” on top of the stack, and then pushes another “2”. It then calls “mult” which multiplies the first two values on the stack together and pushes the result (4). Finally, it calls stop, halting execution.
Looking at the return opcode definition, you can see that it returns values stored in memory (the expression form is return memory[offset:offset+length]
). So, let’s write a value to memory!
As previously mentioned, memory is sequentially addressable. We’ll use the MSTORE8
instruction, which sets memory at the index value on the top of the stack (we’ll call this s) to the second value on the stack (s+1).
Basically we’re telling the EVM to store the 8 bit value “2” at location “0”.
Running this through the evm, we can now see a layout of memory too, with our 2 value sitting in the first byte. Nice!
Next, let’s go ahead and try that return.
This is telling the evm “go ahead and return memory values 0 to 0+1”. Running that, we get a return value of 0x2
. Perfect!
Every time a Solidity contract calls a function of another contract, it does so by producing a message call.
Every call has a sender, a recipient, a payload, a value, and an amount of gas. The depth of the message call is limited to less than 1024 levels.
address.call.gas(gas).value(value)(data)
gas
is the amount of gas to be forwarded
address
is the address to be called
value
is the amount of Ether to be transferred in Wei
data
is the payload to be sent
gas and value are optional parameters
→ be careful because almost all the remaining gas
of the sender will be sent by default in a low-level call.
Given that every call can end in an out-of-gas (OOG) exception, to avoid security issues at least 1/64th of the sender’s remaining gas will be saved. This allows senders to handle inner calls’ out-of-gas errors so that they are able to finish its execution without themselves running out of gas, and thus bubbling the exception up.
Memory is a volatile read-write byte-addressable space. It is mainly used to store data during execution, mostly for passing arguments to internal functions. Given this is a volatile area, every message call starts with a cleared memory. All locations are initially defined as zero. As calldata, memory can be addressed at the byte level, but can only read 32-byte words at a time.
Memory is said to “expand” when we write to a word in it that was not previously used. Additionally to the cost of the write itself, there is a cost to this expansion, which increases linearly for the first 724 bytes and quadratically after that.
The EVM provides three opcodes to interact with the memory area:
MLOAD
loads a word from memory into the stack.
MSTORE
**saves a word to memory.
MSTORE8
saves a byte to memory.
Solidity also provides an inline assembly version of these opcodes.
There is another key thing we need to know about memory. Solidity always stores a free memory pointer at position 0x40
, i.e. a reference to the first unused word in memory. That’s why we load this word to operate with inline assembly. Since the initial 64 bytes of memory is reserved for the EVM, this is how we can ensure that we are not overwriting memory that is used internally by Solidity. For instance, in the delegatecall
example presented above, we were loading this pointer to store the given calldata
to forward it. This is because the inline-assembly opcode delegatecall
needs to fetch its payload from memory.
Additionally, if you pay attention to the bytecode output by the Solidity compiler, you will notice that all of them start with 0x6060604052…, which means:
You must be very careful when operating with memory at assembly level. Otherwise, you could overwrite a reserved space.
Big Endian for memory (page 69):https://takenobu-hs.github.io/downloads/ethereum_evm_illustrated.pdf
Storage is a persistent read-write word-addressable space. This is where each contract stores its persistent information. Unlike memory, storage is a persistent area and can only be addressed by words. It is a key-value mapping of 2²⁵⁶ slots of 32 bytes each. A contract can neither read nor write to any storage apart from its own. All locations are initially defined as zero.
The amount of gas required to save data into storage is one of the highest among operations of the EVM. This cost is not always the same. Modifying a storage slot from a zero value to a non-zero one costs 20,000. While storing the same non-zero value or setting a non-zero value to zero costs 5,000. However, in the last scenario, when a non-zero value is set to zero, a refund of 15,000 will be given.
The EVM provides two opcodes to operate the storage:
SLOAD
loads a word from storage into the stack.
SSTORE
**saves a word to storage.
These opcodes are also supported by the inline assembly of Solidity.
Solidity will automatically map every defined state variable of your contract to a slot in storage. The strategy is fairly simple — statically sized variables (everything except mappings and dynamic arrays) are laid out contiguously in storage starting from position 0.
For dynamic arrays, this slot (p
) stores the length of the array and its data will be located at the slot number that results from hashing p(keccak256(p)
). For mappings, this slot is unused and the value corresponding to a key k
will be located at keccak256(k,p)
. Bear in mind that the parameters of keccak256 (k
and p
) are always padded to 32 bytes.
storage[0] → 2
storage[1] → address of contract
storage[2] → length of array
storage[3] → unused, the mapping values are stored
What is the Ethereum state and how is it modified?
Ethereum is a distributed state machine. Ethereum's state is a machine state, which can change from block to block according to a pre-defined set of rules, and which can execute arbitrary machine code.
The specific rules of changing state from block to block are defined by the EVM.
The state is an enormous data structure called a modified Merkle Patricia Trie, which keeps all accounts linked by hashes and reducible to a single root hash stored on the blockchain.
How does the EVM work? (opcodes, bytecode, stacks etc.)
The EVM executes as a stack machine with a depth of 1024 items. Each item is a 256-bit word.
During execution, the EVM maintains a transient memory, which does not persist between transactions.
Contracts contain a Merkle Patricia storage trie associated with the account in question and part of the global state.
Compiled smart contract bytecode executes as a number of EVM opcodes, which perform standard stack operations like XOR
, AND
, ADD
, SUB
, etc. The EVM also implements a number of blockchain-specific stack operations, such as ADDRESS
, BALANCE
, BLOCKHASH.
How many opcodes are there?
141 opcodes, 1 byte each, 256 max
What are the opcodes used for writing and querying memory and storage?
MLOAD and MSTORE - 3 gas
SLOAD and SSTORE - 100 gas
What is the maximum contract size?
24,576 bytes (24KB)
What are precompiled contracts and how do they work?
A special kind of contracts that are bundled with the EVM at fixed addresses, and can be called with a determined gas cost
They are called from the opcodes like regular contracts, with instructions like CALL
New hardforks may introduce new precompiled contracts
Examples: ecRecover, SHA2-256, identity, modexp, ecAdd, ecMul, ecPairing