Hi,
I've recently needed to use the libffi
crate to allow an interpreter to make native calls to external functions whose signatures are not known until runtime.
The examples in the repo demonstrate using statically-known signatures (and thus static arrays of arguments). The examples are of limited utility, because if you knew the signatures you wanted to call ahead of time, you'd just call an extern
-declared symbol directly without libffi. As far as I know, libffi is intended for scenarios where the type signatures are dynamic.
Once you start dealing with calls with dynamic types, it becomes unclear how to best safely use the crate. For example, my first attempt looked a bit like:
use libffi::middle::*;
use std::ffi::c_void;
// Pretend that this function is from an external dynamic C library.
fn f(a: u64, b: u64, c:u64) {
println!("a={}, b={}, c={}", a, b, c);
}
fn call_ext(addr: *mut c_void, num_args: usize) {
let mut args = Vec::new();
let mut builder = Builder::new();
for i in 0..num_args {
args.push(arg(&i)); // Bad!
builder = builder.arg(Type::u64());
}
builder = builder.res(Type::void());
let cif = builder.into_cif();
unsafe {cif.call::<()>(CodePtr(addr), &args)};
}
fn main() {
call_ext(&f as *const _ as *mut c_void, 3);
}
This program is UB because arg()
calls Arg::new()
, which looks like:
pub fn new<T>(r: &T) -> Self {
Arg(r as *const T as *mut c_void)
}
So it stashes a raw pointer to its argument, but in the case of my example above:
- The same memory is re-used for
&i
on each iteration of the loop, so we'd unintentionally be doing the call with 3 identical arguments.
- The storage for the
i
dies once the loop finishes, so the Arg
stores a dangling pointer. Using the resulting pointers invokes UB.
(Another gotcha here, is that we've inadvertently made a vector of usize
args, since we didn't add explicit type annotations. But that's another story)
My next attempt was to do something like:
use libffi::middle::*;
use std::ffi::c_void;
// Pretend that this function is from an external dynamic C library.
fn f(a: u64, b: u64, c:u64) {
println!("a={}, b={}, c={}", a, b, c);
}
fn call_ext(addr: *mut c_void, num_args: usize) {
let mut builder = Builder::new();
let mut args: Vec<u64> = Vec::new(); // <-----------------
for i in 0..num_args {
args.push(u64::try_from(i).unwrap());
builder = builder.arg(Type::u64());
}
let cif = builder.into_cif();
let ffi_args = args.iter().map(|a| arg(a)).collect::<Vec<Arg>>(); // <-----------------
unsafe {cif.call::<()>(CodePtr(addr), &ffi_args)};
}
fn main() {
call_ext(&f as *const _ as *mut c_void, 3);
println!("Hello, world!");
}
This solves the above problems by:
- ensuring that the storage for each argument is a distinct memory address
- ensuring that the storage out-lives the ffi call.
But I'm still not certain that this is correct. The example assumes that the backing storage of the vector holding the arguments is not moved: a guarantee I don't think we have(?).
My next attempt would be to take a slice from the argument vector. By taking a slice, Rust cannot move the vector's backing storage (if it wanted to, the program would (hopefully?) not compile). But having shown this to colleagues, they have concerns about pointer aliasing rules.
So my question is: what is the correct and safe way to use libffi::middle
to create dynamically-typed calls to external functions?
The only solution I can see is to manually manage an unmovable chunk of memory with malloc()
(or using a something like the alloc
crate). There has to be a better way.
(FWIW, all of the programs I've shown so far seg-fault, although when I used the approach from the latter example in my interpreter, it did work, but perhaps by chance).
(Side question: You can't use libffi::high
to do dynamic calls, can you? You'd need to ability to dynamically create a Rust type signature as far as i can see, and if you could do that, you wouldn't need libffi)
Thanks!