Giter VIP home page Giter VIP logo

abomonation's Introduction

Abomonation

A mortifying serialization library for Rust

Abomonation (spelling intentional) is a serialization library for Rust based on the very simple idea that if someone presents data for serialization it will copy those exact bits, and then follow any pointers and copy those bits, and so on. When deserializing it recovers the exact bits, and then corrects pointers to aim at the serialized forms of the chased data.

Warning: Abomonation should not be used on any data you care strongly about, or from any computer you value the data on. The encode and decode methods do things that may be undefined behavior, and you shouldn't stand for that. Specifically, encode exposes padding bytes to memcpy, and decode doesn't much respect alignment.

Please consult the abomonation documentation for more specific information.

Here is an example of using Abomonation. It is very easy to use. Frighteningly easy.

extern crate abomonation;
use abomonation::{encode, decode};

// create some test data out of abomonation-approved types
let vector = (0..256u64).map(|i| (i, format!("{}", i))).collect();

// encode vector into a Vec<u8>
let mut bytes = Vec::new();
unsafe { encode(&vector, &mut bytes); }

// unsafely decode a &Vec<(u64, String)> from binary data (maybe your utf8 are lies!).
if let Some((result, remaining) = unsafe { decode::<Vec<(u64, String)>>(&mut bytes) } {
    assert!(result == &vector);
    assert!(remaining.len() == 0);
}

When you use Abomonation things may go really fast. That is because it does so little work, and mostly just copies large hunks of memory. Typing

cargo bench

will trigger Rust's benchmarking infrastructure (or an error if you are not using nightly. bad luck). The tests repeatedly encode Vec<u64>, Vec<String>, and Vec<Vec<(u64, String)>> giving numbers like:

test u64_enc        ... bench:         131 ns/iter (+/- 58) = 62717 MB/s
test string10_enc   ... bench:       8,784 ns/iter (+/- 2,791) = 3966 MB/s
test vec_u_s_enc    ... bench:       8,964 ns/iter (+/- 1,439) = 4886 MB/s

They also repeatedly decode the same data, giving numbers like:

test u64_dec        ... bench:           2 ns/iter (+/- 1) = 4108000 MB/s
test string10_dec   ... bench:       1,058 ns/iter (+/- 349) = 32930 MB/s
test vec_u_s_dec    ... bench:       1,232 ns/iter (+/- 223) = 35551 MB/s

These throughputs are so high because there is very little to do: internal pointers need to be corrected, but in their absence (e.g. u64) there is literally nothing to do.

Be warned that these numbers are not goodput, but rather the total number of bytes moved, which is equal to the in-memory representation of the data. On a 64bit system, a String requires 24 bytes plus one byte per character, which can be a lot of overhead for small strings.

unsafe_abomonate!

Abomonation comes with the unsafe_abomonate! macro implementing Abomonation for structs which are essentially equivalent to a tuple of other Abomonable types. To use the macro, you must put the #[macro_use] modifier before extern crate abomonation;.

Please note that unsafe_abomonate! synthesizes unsafe implementations of Abomonation, and it is should be considered unsafe to invoke.

#[macro_use]
extern crate abomonation;
use abomonation::{encode, decode};

#[derive(Eq, PartialEq)]
struct MyStruct {
    pub a: String,
    pub b: u64,
    pub c: Vec<u8>,
}

// (type : field1, field2 .. )
unsafe_abomonate!(MyStruct : a, b, c);

// create some test data out of abomonation-approved types
let record = MyStruct{ a: "test".to_owned(), b: 0, c: vec![0, 1, 2] };

// encode vector into a Vec<u8>
let mut bytes = Vec::new();
unsafe { encode(&record, &mut bytes); }

// decode a &Vec<(u64, String)> from binary data
if let Some((result, remaining)) = unsafe { decode::<MyStruct>(&mut bytes) } {
    assert!(result == &record);
    assert!(remaining.len() == 0);
}

Be warned that implementing Abomonable for types can be a giant disaster and is entirely discouraged.

abomonation's People

Contributors

antiguru avatar frankmcsherry avatar milibopp avatar sdht0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abomonation's Issues

Unsound usages of `VecFromRawParts`

Hi, I am scanning the abomination in the latest version with my own static analyzer tool.

Unsafe conversion found at: src/lib.rs#L496

#[inline]
unsafe fn exhume<'a,'b>(&'a mut self, bytes: &'b mut [u8]) -> Option<&'b mut [u8]> {

   // extract memory from bytes to back our vector
   let binary_len = self.len() * mem::size_of::<T>();
   if binary_len > bytes.len() { None }
   else {
      let (mine, mut rest) = bytes.split_at_mut(binary_len);
      let slice = std::slice::from_raw_parts_mut(mine.as_mut_ptr() as *mut T, self.len());
      std::ptr::write(self, Vec::from_raw_parts(slice.as_mut_ptr(), self.len(), self.len()));
      for element in self.iter_mut() {
            let temp = rest;             // temp variable explains lifetimes (mysterious!)
            rest = element.exhume(temp)?;
      }
      Some(rest)
   }
}

This unsound implementation of Vec::from_raw_parts would create a dangling pointer issues if the mine is dropped automatically before the rest is used. The 'mem::forget' function can be used to avoid the issue.

This would potentially cause undefined behaviors in Rust. If we further manipulate the problematic converted types, it would potentially lead to different consequences such as uaf or double free. I am reporting this issue for your attention.

Is it a good idea to implement Abomonation for non-abomonable PhantomData?

So, while resolving the memory safety issue of #28 that you pointed out in #27, I had a pause while reaching the implementation of Abomonation for PhantomData.

Currently, abomonation provides an impl of Abomonation for PhantomData<T> even if T is not abomonable. This is by design, as there is a test checking that this impl is available. And it is certainly technically correct to the first order of approximation: since PhantomData contains no data, it is trivially serializable.

Where I get uneasy, though, is when I consider how PhantomData<T> is typically used. By and large, the main use for this marker type in the wild is in container classes like Box and Vec, where you get types which only hold a *mut T, NonNull<T>, or index into some kind of arena of T, but need to tell rustc that they "logically own" one or more Ts, so that Send, Sync, Drop and other stuff that gets automatically implemented works as expected.

From this perspective, if a type contains a PhantomData<T>, it should almost certainly be regarded as containing a T by abomonation too. In which case we should require that this T be abomonable.

What do you think about this train of thought?

NonZeroI16 is nightly only

This breaks abomonation, and, because differential depends on abomonation 0.7.*, by transitivity breaks differential as well:

impl Abomonation for NonZeroI16 { }

The problem is that NonZeroI16 and related types are deprecated after rustc 1.26 and also marked nightly only for some reason.

Sanitize addresses in serialized data.

As of 0.5 Abomonation doesn't automatically sanitize addresses in serialized data. This is mainly due to requiring random access to the post-serialized data, which means (roughly) a &mut [u8] interface to the written data, and not all W: Write provide this.

Instead, we could add back something like

pub fn sanitize<T: Abomonation>(bytes: &mut [u8])

which would treat bytes as a &T and erase the associated memory-address holding fields.

I'm not 100% certain what the right way to erase the fields is, as the packing of exciting discriminant information into such fields is recent sport for the Rust folks. It could just be pushing a 0x01 in there (what is used to be), but this could change at a moment's notice, I would guess.

Define a framing protocol

(from a chat with @frankmcsherry:)

This should allow writing to files and sockets, while accomodating multiple use cases (possibly with different framing structures).

Perhaps Abomonation should have read and write methods for readers and writers, and it does the framing for you and doesn't give the choice of forgetting.

Things to keep in mind:

  • If each abomonated object is prepended by a length marker: "the length is also handy in that it lets you zip through an array faster (moving from object to object, rather than deserializing each to determine the length).

A Framed struct:

A struct Framed<T: Abomonation> { len: usize, data: T } which then implements Abomonation, but in a magical special way where maybe (i) len is written as part of abomonation, or maybe (ii) len is computed by fake serialization (relatively cheap, without traversing all the data).
This has other advantages, like allocating enough memory to write T rather than repeatedly growing / copying the Vec

  • More complex headers may be folded into the abomonated T? For example, what should we do with the message headers in timely_communication?

Is it okay to implement Abomonation for both T and &T?

From the point of view of abomonation's core semantics, there is nothing wrong with providing implementations of the Abomonation trait for both a type T and a reference to it &T. Basically, the implementation for &T works exactly like that of Box<T> in abomonation's current master branch.

Such implementations would be useful for high-level users of Abomonation, who stick with derives, encode, decode and measure, because they would allow safely auto-deriving Abomonation for more types. Something which, as a matter of fact, I ended up wanting for my project.

However, and that's the reason why I'm opening this issue before submitting my changes as a PR, there is also a significant ergonomic cost to doing so for any low-level user of Abomonation who calls into the trait directly.

If Abomonation is implemented for both T and &T, then anyone who uses the Abomonation trait directly instead of going through encode, decode and measure must be super-careful, because method call syntax of the x.extent() kind becomes a deadly trap that can very easily trigger the wrong Abomonation impl through auto-Deref magic.

Instead, one must get into the habit of only calling Abomonation trait method via U::extent(&x) syntax, or if type U is unknown go for the somewhat less safe compromise of Abomonation::extent(&x).

Is this a trade-off that we can tolerate for the sake of having more Abomonation impls?

Consider taking writers by value

&mut T where T: Write implements Write, so there is no reason to ask for a borrowed writer in the API. But this is unlikely to be a problem in practice because owned writers are rarely used.

Should abomonation start using trybuild tests?

Resolving #27 entailed walking on some razor blades to figure out the right set of lifetime constraints needed to allow deserializing references, without allowing invalid deserializations (like deserializing a fake &'static T from stack-allocated data).

Given that someone (maybe you, maybe I) may need to touch that code again in the future, and that it is easy to get wrong, I would sleep better at night if I could add some compilation failure tests to #28 in order to make sure that some classic invalid reference deserialization examples will continue to refuse to compile in the future.

Unfortunately, rust does not have a nice built-in mechanism for that sort of tests, but someone has suggested using the trybuild crate for this purpose.

The two drawbacks are that 1/it's one more dependency and 2/since it's based on parsing rustc output, which is not subjected to any stability guarantee, those tests are likely to require occasional maintenance the future so that they keep working on new rustc versions.

Is inline_always really needed?

Inconsiderate use of inline_always can result in code bloat (which, if taken too far, leads to L1i cache misses) and increased compilation times.

The inlining heuristics of rustc normally aren't too bad, so it might be worthwhile to investigate how many of these annotations can be replaced with plain inline or removed entirely without incurring a significant performance cost.

Shouldn't exhume take a NonNull<Self> rather than a &mut Self?

Rust references allow the compiler to assume that the data behind them is valid. One way in which rustc currently uses this is to tag the associated pointers with LLVM's dereferencable attibute, which allows the latter to prefetch from them to its heart's content. This kind of smart optimization should not be allowed before objects are exhumed, as it can lead to undefined behavior like LLVM following dangling pointers and segfaulting the program.

Therefore, I think exhume should not take its target object as a Rust reference, but as a NonNull pointer, which provides no guarantee of target data validity to rustc and therefore doesn't allow the compiler to muck around with it.

Shouldn't entomb and encode take a self rather than a &self?

In Rust terms, abomonation serialization is effectively a sophisticated move. Therefore, there is no technical reason why it shouldn't be possible to abomonate types which are movable but not clonable, such as Box<T> where T: !Clone.

However, the entomb operation, and its encode higher-level cousin take their input object by shared reference. This makes it impossible to correctly implement the Abomonation trait for non-copyable types, which needlessly restricts its applicability.

Please consider modifying this interface to take input objects by value instead.

A possible path forward for padding bytes

So, I've had a quick chat with @RalfJung about our padding bytes problem, and I think I now get a decent grasp of what we need in order to resolve that particular UB in abomonation.

Padding bytes are uninitialized memory, and we now have a safe way to model that in Rust, in the form of MaybeUninit. So we can take a first step towards handling them correctly today by casting &[T] into &[MaybeUninit<u8>] instead of &[u8].

This is enough to memcpy the bytes into another &mut [MaybeUninit<u8>] slice. But it's not yet enough to expose our unintialized bytes to the outside world, e.g. for the purpose of sending them to Write in encode() and entomb(), because Write wants initialized bytes, not possibly uninitialized ones.

To resolve this, we need another language functionality, which is not available yet but frequently requested from the UCG: the freeze() operation, a tool which can turn MaybeUninit<u8> into a nondeterministic valid u8 value. You can think of it as a way to opt out of the UB of reading bad data and defer to hardware "whatever was lying around at that memory address" behavior.

IIUC, something like that was proposed a long time ago, but it was initially rejected by security-conscious people on the ground that it could be used to observe the value of uninitialized memory coming from malloc(), which may leak sensitive information like cryptographic secrets which a process forgot to volatile-erase before calling free().

That precaution is commendable, but on the other hand, giving the growing body of evidence that an UB-free way to access specific regions of memory is needed for many use cases (from IPC with untrusted processes to implementation of certain low-overhead thread synchronization algorithms like seqlock), I'm hopeful that we're likely to get something like that in Rust eventually (and I will in fact take steps to make this discussion move forward once I'm done with my current UCG effort).

TL;DR: For now, this is blocked on a missing Rust feature, but the issue seems understood and is likely to be eventually resolved.

potential issue with unsafe_abomonate! on latest version

Code very similar to the following works with Abomonation version 0.4.5 but not with 0.5:

#[macro_use]
extern crate abomonation;
extern crate timely;

use timely::dataflow::InputHandle;
use abomonation::Abomonation;

pub struct Foo {
  x: Vec<u8>,
  y: Vec<u8>,
}

pub struct Bar {
  z: Vec<u8>,
}

pub struct Baz {
  foo: Foo,
  bar: Bar,
}

unsafe_abomonate!(Foo: x, y);
unsafe_abomonate!(Bar: z);
unsafe_abomonate!(Baz: foo, bar);

fn main() {
  let input: InputHandle<u64, Baz> = InputHandle::new();
  // do other stuff
}

In version 0.5 I get the error:
"the trait bound Baz: abomonation::Abomonation is not satisfied
the trait abomonation::Abomonation is not implemented for Baz.

note: required because of the requirements of the impl of Timely::Data for Baz
note: required by timely::dataflow::Handle"

Did anything change in the way unsafe_abomonate work? I noticed you removed a generics parameter, but I'm not sure if that has any effect here.

Can the pointer alignment situation be improved?

As the docs say, abomonation currently doesn't guarantee correct pointer alignment. This is pretty dangerous, even on x86 as rustc might be tempted to generate those evil SIMD instructions that assume the data is aligned and raise an exception otherwise someday.

I wonder if there is an API tweak we could use to improve upon this situation?

Implementations for standard library types

By comparison, serde implements its Serialize trait for a lot more types than abomonation does. It might be a nice idea to gradually add more types from std to avoid ergonomic issues caused by missing implementations.

To get started, I added a trivial implementation for PhantomData in #4, because I think it will help make integrating nalgebra and abomonation in dimforge/nalgebra#277 more ergonomic. Other types will likely be more work ;)

Should Abomonated use StableDeref?

Abomonated is not the only Rust abstraction that relies on slices of bytes Deref-ing into the same location even after a move. Pretty much every attempt at building self-referential types in Rust (which we're kinda doing inside of Abomonated) needs this property. As a result, someone has built the nice stable_deref_trait crate, which provides a trait for exactly this purpose.

We could reduce the unsafety of Abomonated<T, S>::new by requiring S: StableDeref. Unfortunately, we couldn't completely remove the unsafety in this way, because there is still the shared mutability issue to take care of. A long time ago, Rust had a nice Freeze trait for this, but that trait is now gone from the stable subset of the language and there is no sign of it coming back anytime soon. Still, I think partially removing the unsafety is worthwhile.

Using StableDeref would add an extra crate to abomonation's dependency list, but that crate is very small so I don't think it's a big issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.