Comments (17)
Hello, I'm the author of the tool cargo-geiger. Thank you @dtolnay for yet another significant open source Rust contribution, your work is amazing, period. This approach to C++ FFI looks wonderful from my perspective.
@bjorn3 I think I can see the point you are making, that it should be easy to find out if a crate depends on any unsafe code including FFI? Let's continue that discussion including detection of unsafe code using macro expansion over at the cargo-geiger repos.
from cxx.
I understand what you're saying but I don't find it to be a distinction that would be valuable for maintaining correctness of a codebase in practice.
My approach is as follows:
- Fact 1: unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior.
- Fact 2: all C++ is unsafe.
- Fact 3: if you don't need to audit the Rust code, it doesn't need to be unsafe.
from cxx.
Fact 1: unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior.
That's already not true, though. However I have to admit that this particular refutation has little bearing on the discussion at hand.
I think @bjorn3 is following the interpretation that when running a Rust program, all possible UB must arise from some unsafe block. This property is violated by cxx, because C++ code is not inside a literal unsafe block. Usually, writing unsafe { extern_fn() }
is the place where the proof obligation for the C++ code occurs.
Consider the following: suppose you were using traditional bindgen to expose an unsafe FFI which you then wrapped in a safe Rust API, as is common practice. That doesn't give you the right to later mess up the C++ code arbitrarily badly without getting UB...
No, it doesn't -- so? The unsafe
block on the Rust side crucially marks the place where the proof obligation is checked, and of course if anything that is relied upon by the corresponding proof changes, it needs to be re-checked. This is true even for pure Rust code: calling some unsafe fn
in a correct way does not give you the right to later change that unsafe Rust code and change its safety contract; you have to re-check all uses at that point.
@dtolnay on the other hand views C++ code as implicitly carrying an unsafe block all around it.
I can understand this view, but I think the issue with this is the lack of a clear proof obligation. unsafe
blocks aren't about the exact auditing scope (see above); they are all about marking where exactly which proof obligations arise. (This is consistent with my blog post: the proof obligations for set_len
start at some unsafe
block in Vec
that would cause UB with a incorrect length; that obligation is discharged via Vec
's datastructure invariant, and that invariant in turns puts an obligation on the entire module.)
With cxx, what has to be proven about the C++ code is determined very non-locally by the cxx bridge file. The C++ type system is too weak to tell you what you have to prove. Crucially, that cxx bridge file doesn't even have any unsafe
in it so it is not clear that there is some proof obligation being determined here. (This is basically the point @ogoffart is making above as well; I wonder what your thoughts on that are @dtolnay.)
from cxx.
But by that definition, extern functions should not be unsafe. But they are:
extern { fn init_stuff(); }
fn main() {
init_stuff(); // Error: call to unsafe function is unsafe and requires unsafe function or block
}
Why is calling init_stuff()
unsafe? I could claim that calling it is not unsafe because it is the C code that need to be audited.
from cxx.
Moderator note: I am deleting comments because GitHub's linear comment structure makes it unmanageable to hold a complex forking discussion here. Discussions that are not the dominant thread would be better to hold here.
from cxx.
100% of C++ code is unsafe. When auditing a project, you would be on the hook for auditing all the unsafe Rust code and all the C++ code. The core safety claim under this new model is that auditing just the C++ side would be sufficient to catch all problems, i.e. the Rust side can be 100% safe.
from cxx.
There's an interpretation of safe Rust code which allows to draw a clear boundary of what should be marked unsafe, which is that it should not be possible to cause undefined behavior with safe Rust code.
That's certainly the case here. You can't cause undefined behavior with just safe Rust code. You need to add some unsafe C++ code in order to get undefined behavior.
Consider the following: suppose you were using traditional bindgen to expose an unsafe FFI which you then wrapped in a safe Rust API, as is common practice. That doesn't give you the right to later mess up the C++ code arbitrarily badly without getting UB...
With CXX you can mess up the Rust code arbitrarily badly without getting UB, so the Rust code is safe.
from cxx.
But by that definition, extern functions should not be unsafe.
extern { fn init_stuff(); }Why is calling
init_stuff()
unsafe? I could claim that calling it is not unsafe because it is the C code that need to be audited.
In the case of a handwritten extern, the two reasons it's unsafe are:
-
You might have the wrong signature. You don't get to mess up the Rust side arbitrarily badly (after reviewing what is being exposed from C++) without getting UB. CXX solves this.
-
Not all C++ functions with the type
void (*)()
are free of proof obligations.
CXX is designed for C++ functions that are fine to call with any possible value of their arguments and this would be something that an audit of the C++ side of the bridge would be responsible for verifying (but this definitely should be called out better in the documentation).
from cxx.
I understand. Making the rust side safe could make for example cargo-geiger
think there is nothing to audit though, as it doesn't peek into the macro expansion.
from cxx.
The user of cargo-geiger would need to know that if there is non-Rust code in the project then Rust's safety guarantees obviously do not apply to the non-Rust code.
I don't know much about cargo-geiger but hopefully there would be some way for it to flag the presence of non-Rust code.
from cxx.
There's an interpretation of safe Rust code which allows to draw a clear boundary of what should be marked unsafe, which is that it should not be possible to cause undefined behavior with safe Rust code. Do I understand correctly that the intended usage of cxx is to write the C++ side in such a way as to satisfy this Rust side requirement, rather than marking the functions unsafe from the Rust side?
from cxx.
That doesn't give you the right to later mess up the C++ code arbitrarily badly without getting UB...
Yes, but in that case you uses unsafe {}
to claim that the C++ doesn't cause UB. With cxx
you can call C++ code which causes UB without ever using the unsafe
keyword.
from cxx.
I think @bjorn3 is following the interpretation that when running a Rust program, all possible UB must arise from some unsafe block.
Indeed
Crucially, that cxx bridge file doesn't even have any unsafe in it so it is not clear that there is some proof obligation being determined here.
Maybe add a #[unsafe_assume_no_ub]
(anyone a better name?) which makes the function safe to call, with the default being unsafe when #[unsafe_assume_no_ub]
is not used?
from cxx.
@RalfJung I think your post you linked reinforces my perspective. The emphasis is that "unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior. As with the Vec implementation you talk about, looking at one piece of unsafe code can expose other code not necessarily bearing the unsafe keyword that also needs to be considered, but through the process you discover all problems and can convince yourself that the whole system is good.
Your wording of "C++ code as implicitly carrying an unsafe block all around it" is accurate. This is just how that language works, and how audits of FFI to it always work. https://www.reddit.com/r/rust/comments/elvfyn/ffi_like_its_2020_announcing_safe_ffi_for_rust_c/fdm8olr/ covers this in more detail from the perspective of real practical users of FFI. C++ isn't about clear formal proof obligations, it relies on the programmer to understand what they are looking at in broader context; this is pervasive to the language and applies here.
I would add that in addition to C++ code carrying an implicit unsafe block all around it, Cargo build scripts also always carry an implicit unsafe block all around and must be audited to discover the necessary understanding of the system and its safety requirements.
from cxx.
I think @bjorn3 is following the interpretation that when running a Rust program, all possible UB must arise from some unsafe block. This property is violated by cxx, because C++ code is not inside a literal unsafe block. Usually, writing
unsafe { extern_fn() }
is the place where the proof obligation for the C++ code occurs.
I don't find this to be a practical litmus test nor helpful for upholding the correctness of a codebase in practice. I've called out that all C++ and all build scripts implicitly carry the unsafe you are looking for. To drive this home, consider:
// build.rs
fn main() {
let f = curl("https://gist.github.com/dtolnay/.../mystery.rs");
let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
fs::write(out_dir.join("mystery.rs"), f).unwrap();
}
// src/lib.rs
include!(concat!(env!("OUT_DIR"), "/mystery.rs"));
What do we grep for to find the undefined behavior in this project? The solution isn't to make curling unsafe, or fs::write unsafe, or include!
unsafe. I would see that as grasping to justify an incorrect litmus test.
from cxx.
CXX is designed for C++ functions that are fine to call with any possible value of their arguments
Does CXX codegen automatically prove this? If not, why generated bindings are safe?
from cxx.
The emphasis is that "unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior.
I would extend this to say "to discover all sources of undefined behavior and the proof obligations to avoid it". And that's where cxx's approach falls short: even if we accept that C++ has an implicitly unsafe block all around it, that tells us nothing about what the actual proof obligation is.
And you are right that build scripts (and most forms of code generation) break this. There's not much we can do about that. Item-level proc-macros sometimes have a similar problem, and the usual approach seems to be to call them something with unsafe
in their name -- that would match @bjorn3's proposal of an #[unsafe_assume_no_ub]
attribute.
CXX is designed for C++ functions that are fine to call with any possible value of their arguments and this would be something that an audit of the C++ side of the bridge would be responsible for verifying (but this definitely should be called out better in the documentation).
The part about any possible value might be worth calling out more explicitly in the docs, maybe?
from cxx.
Related Issues (20)
- Question: estimate workload to call functions from geogram HOT 1
- Returning `CxxString` from Rust to C++ HOT 5
- Extern C++ Enums HOT 5
- Advice/example on docs.rs strategy HOT 2
- Feature Request: Add support for nested Vector types HOT 1
- Building two CXX C++ bindings with the same symbols but in different namespaces. HOT 2
- Including bridge from another crate
- Error in configuring crates_universe when included as a Bazel module HOT 6
- Error compiling `cxx.cc` on Windows HOT 1
- Proposal: implementing native C++ std::error_code
- Link failure when using CXX-backed function from another crate HOT 1
- Unsupported element type of Vec<[u64; 3]>
- How to include C++ files whose directory is given by an environment variable? HOT 1
- Error compiling demo to wasm on macOS HOT 5
- Is there a way to expose a C++ class that is itself a wrapper on `std::shared_ptr`?
- How can I `include!` a file in OUT_DIR
- How can I return a vector of non-trivial type of C++ to Rust?
- LNK2019 unresolved symbol for Vec<SharedStruct> using MSVC
- Using CXX in multiple DLLs always results in undefined behavior?
- Does the core safety claim need rephrasing? HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cxx.