This note suggests a different, potentially more efficient, simpler and 100% safe implementation of the VMs value representation. The sketched design is influenced by the recent implementation of Move on the EVM.
There are three major components to this alternative model:
- Use byte addressable linear memory to reason about Move values and references
- Make access to this memory fast via unsafe
*mut
, but ensure the public value API is safe.
- Erase runtime type information from values, up to the extent safe representation can be guaranteed. Bytecode verification Β should already guarantee type correctness. Β If the bytecode is not type-correct, VM execution will still be safe, but the result undetermined.
Basic Model
Borrow semantics is a model for linear addressable memory, including pointer indirection and address arithmetic. The claim here is that even though this model is tailored for microprocessors, it is also the best for a virtual machine.
Given this approach, values are simply represented as regions of linear memory. The interpretation of the data in this memory is discussed later.
struct Value<'a>(*mut[u8], PhantomData<&'a ()>);
The lifetime 'a
here is bound to a heap of values, e.g.associated with a Move VM session. The implementation ensures safety for all usages of Value<'a>
, guaranteed by Rust lifetime checking. The heap can be a simple arena, as sessions are short living. (This is how the EVM does it.), but more investigation is needed here. Similar to the heap, there is a stack frame which is also linear addressable, as well as slots for global values.
Types and Conversions
Values are opaque. One needs a type to interpret a value., e.g.
let val: Value<'a>;
U8_TYPE.write(val, U8_TYPE.read(val)? + 1)?;
For native functions, types are either known statically, or are passed as parameters if those functions are generic.
We expect that conversion via types is safe. However, because of erasure of type information from values, only restricted forms of runtime checks can be performed (e.g. if the value's memory slice has the correct size). In case the wrong type is used on a value, this will be always safe for Rust, but may result in undetermined values for Move.
We otherwise expect that conversions should be very fast. Specifically, Rust already has builtin primitives to convert [u8,N]
into a u8/u64/u128; this can be expected to compile to single machine instructions.
Structs and Vectors
A struct is represented by a slice of memory where the field data is packed into consecutive bytes. That is if the struct has two consecutive u8 and u128 fields, the first one will occupy the first byte and the 2nd one the remaining 16 bytes; the size of the struct value's memory slice thus is 17.
Borrowing a field from a struct is then simply sub-slicing the struct's slice (Value(r) => Value(r[offs..offs+size])
).
Vectors, because they are dynamically sized, need to be represented by a reference to some other place on the heap, that is as an embedded Value<'a>
. How can we ensure safety if we read such a value from a struct? The answer here is to simply range check whether the memory slice represented by the read value is defined for the currently allocated Rust memory of the heap. Β One simple solution for being able to do so is to never give memory allocated for values back to Rust for the `'a' lifetime. This is adequate because sessions are short living.
For nested structs, two options exist: they can be inlined, or they can be represented similar like vectors via an embedded value. Both approaches have advantages and disadvantages: the first is faster and more compact for access, but also makes moving the nested struct requiring a memcpy.
Evaluation
In the current VM implementation, values are represented by nested Rust enums:
enum ValueImpl {
Β Β U8(u8),
Β Β U64(u64),
Β Β ...
Β Β Container(Container),
}
enum Container {
Β Β Locals(Rc<RefCell<Vec<ValueImpl>>>),
Β Β Vec(Rc<RefCell<Vec<ValueImpl>>>),
Β Β ...
}
This representation leads to multiple indirections and memory allocations, and heavy case switching in the code. The suggested representation is expected to be significantly more compact and efficient regards access and mutation. It should moreover be able to minimize memory fragmentation, specifically if an arena style allocator is used, which can be very important for performance of modern processor architectures.