Giter VIP home page Giter VIP logo

cbor's Issues

Support integer and other types as key

Currently there seems to be no way of specifying that a field of a struct correspond to a numeric key. I am not sure whether this would require an additional annotation, since even using #[serde(rename = "1")] would be ambiguous between the string "1" and the integer value 1.

`de::from_reader` should implement `DeserializeOwned`, not `Deserialize`

I tried to use serde_cbor::de::from_reader to deserialize into a &str and got the surprising runtime error message like this:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorImpl { code: Message("invalid type: string \"foobar\", expected a borrowed string"), offset: 0 }', libcore/result.rs:945:5

It makes sense that from_reader can't do zero-copy like from_slice since there's no source buffer. However, attempting to do so should cause a compile-time failure, not a runtime failure. docopt had a similar bug: docopt/docopt.rs#222

Deserialize multiple objects

I'm using serde_cbor in a linux fifo to pass commands from one daemon to another. When the receiving daemon is not fast enough and two structures are put into the fifo, cbor fails with "TrailingBytes".

Therefore it would be helpful to have a method that returns the size of bytes that have already been decoded. With that information, i.e. an iterator could be implemented.

Maybe such functionality is already in the pipeline?

Packed encoding for structs is broken when fields are conditionally skipped

Because serde doesn't provide field indices when serializing, the serializer currently guesses the index based off of the number of fields serialized. This doesn't work when fields are conditionally skipped, however:

extern crate serde;
extern crate serde_cbor;
#[macro_use]
extern crate serde_derive;

#[derive(Serialize, Deserialize)]
struct Foo {
    a: u32,
    #[serde(skip_serializing_if = "Option::is_none")]
    b: Option<u32>,
    c: u32,
}

#[test]
fn foo() {
    let foo = Foo {
        a: 0,
        b: None,
        c: 1,
    };
    let buf = serde_cbor::ser::to_vec_packed(&foo).unwrap();
    serde_cbor::from_slice::<Foo>(&buf).unwrap();
}
running 1 test
test foo ... FAILED

failures:

---- foo stdout ----
	thread 'foo' panicked at 'called `Result::unwrap()` on an `Err` value: Custom("missing field `c`")', /checkout/src/libcore/result.rs:859
note: Run with `RUST_BACKTRACE=1` for a backtrace.


failures:
    foo

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured

error: test failed, to rerun pass '--lib'

Completed with code 101

Serde 0.9 support

The API changes are pretty significant, looks like.

I can tackle this if you won't have a chance, @pyfisch.

Can not zero-copy deserialize bytes

#[derive(Debug)]
#[derive(Serialize, Deserialize)]
struct Bar<'a>(&'a [u8]);

#[derive(Debug)]
#[derive(Serialize, Deserialize)]
struct Bar2<'a>(&'a str);

fn main() {
    let bar = Bar2("123");
    let c = serde_cbor::to_vec(&bar).unwrap();
    let m: Bar2 = serde_cbor::from_slice(&c).unwrap();
    // ^ work

    let bar = Bar(&[1, 2, 3]);
    let c = serde_cbor::to_vec(&bar).unwrap();
    let m: Bar = serde_cbor::from_slice(&c).unwrap();
    // ^ panic
    // thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorImpl { code: Message("invalid type: sequence, expected a borrowed byte array"), offset: 0 }', libcore/result.rs:945:5
}

cbor can be deserialize &str, but not deserialize &[u8]. I think it's a bug.

Relicense under dual MIT/Apache-2.0

This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic on IRC to discuss.

You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.

TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.

Why?

The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.

Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it
.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.

How?

To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright, due to not being a "creative
work", e.g. a typo fix) and then add the following to your README:

## License

Licensed under either of

 * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
 * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.

and in your license headers, if you have them, use the following boilerplate
(based on that used in Rust):

// Copyright 2016 cbor developers
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

It's commonly asked whether license headers are required. I'm not comfortable
making an official recommendation either way, but the Apache license
recommends it in their appendix on how to use the license.

Be sure to add the relevant LICENSE-{MIT,APACHE} files. You can copy these
from the Rust repo for a plain-text
version.

And don't forget to update the license metadata in your Cargo.toml to:

license = "MIT/Apache-2.0"

I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!

Contributor checkoff

To agree to relicensing, comment with :

I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.

Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.

(De)serialize 128 bit integers

The program crashes when trying to serialize them, complaining i/u128 is not supported
Serde itself already has support for that

`Option<T>` does not round-trip

extern crate serde_cbor;

fn main() {
    let obj1 = Some(10u32);

    let mut v = vec![];
    assert!(serde_cbor::ser::to_writer(&mut v, &obj1).is_ok());
    println!("{:?}", v);
    let obj2: Result<Option<u32>, _> = serde_cbor::de::from_reader(&v[..]);

    assert_eq!(obj1, obj2.unwrap());
}

This code fails with
thread '<main>' panicked at 'called Result::unwrap()on anErr value: SyntaxError(invalid syntax: "incorrect type", 0)', ../src/libcore/result.rs:738

I have code in production that fails with "trailing bytes" when I have a structure whose last element is an Option; I haven't produced a small test case because it's such a PITA to write Serialize/Deserialize impls for structures.

Update crates.io

Hi,

I'm going to release a crate based on this one. And for that crate, it's crucial to support larger sequences.

So would mind releasing current master to crates.io?

IO error EOF getting through

Let's say I have this code:

extern crate serde_cbor;

use std::collections::HashMap;

use serde_cbor::{from_slice, to_vec};

fn main() {
    let mut data = HashMap::new();
    data.insert(42, "Hello");
    let encoded = to_vec(&data).expect("Failed to encode");
    let decoded: HashMap<usize, String> = from_slice(&encoded[..encoded.len() - 1]).expect("Failed to decode");
}

The last line correctly fails, as the last byte is missing. However, this is the error I get:

Failed to decode: Io(Error { repr: Custom(Custom { kind: UnexpectedEof, error: StringError("failed to fill whole buffer") }) })

Looking at the documentation, I'd expect the error to be Eof, not Io(Custom(… io-based-eof…)). To be fair, the documentation doesn't explicitly say that all EOF-based errors are going to be reported as Eof, but it was still a bit surprise.

Better README and documentation

Rewrite the main docs page and the README with an example better reflecting the uses of this crate and of CBOR. The unmaintained label should go.

Polish the crate [meta bug]

Thanks to the rewrite by @sfackler the crate is now in a good shape. I have opened #45 and #46 for specific things that should be improved.

@sfackler: How do you want to adopt the crate? You are already registered as a contributor do you want to transfer the repo? And do you need access to the crates.io account?

I have attached the API guidelines checklist, I will try to check it myself but help is appreciated.

cc @jq-rs, @arthurprs

Rust API Guidelines Checklist

  • Naming (crate aligns with Rust naming conventions)
    • Casing conforms to RFC 430 (C-CASE)
    • Ad-hoc conversions follow as_, to_, into_ conventions (C-CONV)
    • Getter names follow Rust convention (C-GETTER)
    • Methods on collections that produce iterators follow iter, iter_mut, into_iter (C-ITER)
    • Iterator type names match the methods that produce them (C-ITER-TY)
    • Feature names are free of placeholder words (C-FEATURE)
    • Names use a consistent word order (C-WORD-ORDER)
  • Interoperability (crate interacts nicely with other library functionality)
    • Types eagerly implement common traits (C-COMMON-TRAITS)
      • Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash, Debug,
        Display, Default
    • Conversions use the standard traits From, AsRef, AsMut (C-CONV-TRAITS)
    • Collections implement FromIterator and Extend (C-COLLECT)
    • Data structures implement Serde's Serialize, Deserialize (C-SERDE)
    • Types are Send and Sync where possible (C-SEND-SYNC)
    • Error types are meaningful and well-behaved (C-GOOD-ERR)
    • Binary number types provide Hex, Octal, Binary formatting (C-NUM-FMT)
    • Generic reader/writer functions take R: Read and W: Write by value (C-RW-VALUE)
  • Macros (crate presents well-behaved macros)
  • Documentation (crate is abundantly documented)
    • Crate level docs are thorough and include examples (C-CRATE-DOC)
    • All items have a rustdoc example (C-EXAMPLE)
    • Examples use ?, not try!, not unwrap (C-QUESTION-MARK)
    • Function docs include error, panic, and safety considerations (C-FAILURE)
    • Prose contains hyperlinks to relevant things (C-LINK)
    • Cargo.toml includes all common metadata (C-METADATA)
      • authors, description, license, homepage, documentation, repository,
        readme, keywords, categories
    • Crate sets html_root_url attribute "https://docs.rs/CRATE/X.Y.Z" (C-HTML-ROOT)
    • Release notes document all significant changes (C-RELNOTES)
    • Rustdoc does not show unhelpful implementation details (C-HIDDEN)
  • Predictability (crate enables legible code that acts how it looks)
    • Smart pointers do not add inherent methods (C-SMART-PTR)
    • Conversions live on the most specific type involved (C-CONV-SPECIFIC)
    • Functions with a clear receiver are methods (C-METHOD)
    • Functions do not take out-parameters (C-NO-OUT)
    • Operator overloads are unsurprising (C-OVERLOAD)
    • Only smart pointers implement Deref and DerefMut (C-DEREF)
    • Constructors are static, inherent methods (C-CTOR)
  • Flexibility (crate supports diverse real-world use cases)
    • Functions expose intermediate results to avoid duplicate work (C-INTERMEDIATE)
    • Caller decides where to copy and place data (C-CALLER-CONTROL)
    • Functions minimize assumptions about parameters by using generics (C-GENERIC)
    • Traits are object-safe if they may be useful as a trait object (C-OBJECT)
  • Type safety (crate leverages the type system effectively)
    • Newtypes provide static distinctions (C-NEWTYPE)
    • Arguments convey meaning through types, not bool or Option (C-CUSTOM-TYPE)
    • Types for a set of flags are bitflags, not enums (C-BITFLAG)
    • Builders enable construction of complex values (C-BUILDER)
  • Dependability (crate is unlikely to do the wrong thing)
  • Debuggability (crate is conducive to easy debugging)
  • Future proofing (crate is free to improve without breaking users' code)
  • Necessities (to whom they matter, they really matter)
    • Public dependencies of a stable crate are stable (C-STABLE)
    • Crate and its dependencies have a permissive license (C-PERMISSIVE)

Packed format is broken/fragile

The packed format for structs uses the order of the calls to serialize_field to decide on the id of the field for the map.

This is very fragile, and seems to break with the current serde-codegen generated code (sorry I don't have an example here, it's hard to extract the from my current project). I've had cases where a simple to_writer_packed / from_reader failed.

A more robust approach would be to serialize the fields as a sequence instead, as this makes it obvious that the order matters. Or look up the field name when serializing to determine the index in the struct fields array.

Broken roundtrip serialization

#54 (which attempted to fix #51 and #52) actually broke deserialization of Ipv4Addr (or, really, any other human-readability-dependant format) - Deserializer must have is_human_readable returning the same value as serializer.

Right now, if you try to serialize Ipv4Addr, it will serialize it as sequence of bytes, but will try to deserialize as a string, resulting in an error:

called `Result::unwrap()` on an `Err` value: ErrorImpl { code: Message("invalid type:sequence, expected a string"), offset: 0 }

deserializing big byte arrays using `from_reader` does not round-trip

extern crate serde_cbor;
extern crate serde_bytes;

use std::io::Cursor;
use serde_bytes::ByteBuf;

fn main() {
    let input = ByteBuf::from(vec![0u8; 2048 * 1024]);
    let cbor = serde_cbor::to_vec(&input).unwrap();
    let output: ByteBuf = serde_cbor::from_reader(Cursor::new(cbor.as_slice())).unwrap();
    assert_eq!(input.len(), output.len());
}

I would expect input and output to have the same length, but instead output is only 16KiB long.
It works fine though using serde_cbor::from_slice instead of serde_cbor::from_reader.

Deserialize value, returning rest of slice

This is related to #61 but not the same -- I want to deserialize CBOR values as they stream in, without throwing away the rest of the buffer; with serde_json, I do:

fn decode(&mut self) -> Result<(), Error> {
    ({
        let de = Deserializer::from_slice(&self.recv_bytes_queue);
        let mut s = de.into_iter();
        loop {
            match s.next() {
                Some(Ok(value)) => self.recv_value_queue.push_front(value),
                Some(Err(err)) => {
                    if err.is_eof() {
                        break Ok(Some(s.byte_offset()));
                    } else {
                        break Err(err.into());
                    }
                }
                None => break Ok(None),
            }
        }
    }).map(|l| match l {
        Some(l) => {
            // TODO: Make this more efficient.
            self.recv_bytes_queue.drain(..l);
        }
        None => self.recv_bytes_queue.clear(),
    })
}

However, no equivalent StreamDeserializer.byte_offset() exists for serde_cbor.

Detecting invalid cbor when deserializing multiple objects

I am deserializing multiple objects from a slice (like in #20).

I am using the EOF error to detect when I reached the last object, but I have no way to distinguish between an actual EOF and an invalid cbor object.

It would be great to have an Iter implementation that does the right thing or that the thrown error had not the current offset in the stream but the offset where the deserialization began.

no_std support

Serde supports no_std nicely (judging from the documentation); it would be nice if serde_cbor could too.

I'm currently looking around for what parts of the crate would need to be conditional on std's presence, so basically whether cfg(feature=...) gating; will keep this issue updated.

Bad error deserializing empty map to ()

A "long form" empty map will not deserialize:

extern crate serde_cbor;

fn main() {
    let _: () = serde_cbor::from_slice(&[191, 255]).unwrap();
}
thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: TrailingBytes', ../src/libcore/result.rs:746
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Crate upd

Please update crate to fix actual changes

Enum support

I wonder if we can have support for Enums in a similar manner as serde_json does. Not sure if this will be breaking standards though. This would be a great addition.

Remove EOF check during deserialization?

I use bincode in one of my programs, and I'm interested in switching to CBOR to make interoperability with other languages easier. One blocker that I have at the moment is the difficulty of quickly reading a small header struct in a very large file (2–4 GiB). Consider these data structures:

#[derive(Serialize, Deserialize)]
struct Header {
    magic: u32,
    timestamp: i64
}

#[derive(Serialize, Deserialize)]
struct Bucket {
    header: Header,
    data: HashMap<String, Info>
}

If I serialize a Bucket to disk using bincode, I can then deserialize just the header:

bincode::serialize_into(&mut writer, &my_bucket);
let my_header: Header = bincode::deserialize_from(&mut reader).unwrap();

With serde_cbor, I need to read the entire file to get at the header, and that makes the program unacceptably slow. I thought about serializing the header and the HashMap one after the other:

serde_cbor::to_writer(&mut writer, &my_header);
serde_cbor::to_writer(&mut writer, &my_data);

Unfortunately, this doesn't work: when I try to deserialize just the header, the deserializer asserts that the reader has been read entirely.1

Would it be possible to either remove this end-of-stream check or to add another deserializer function that doesn't return an error when there is still data in the buffer?

Compact/packed format

Hi, it'd be nice to use (have an option to use) a compact/packed serialization format (like bincode and rmp) that serializes structs like lists (thus relying on the order of the fields). This allows faster (de)serialization and more compact output.

rmp (msgpack)

[
  2837513946597, 
  123456, 
  [
    1, 
    []
  ], 
  [
    [
      2, 
      []
    ], 
    200, 
    503, 
    520, 
    [
      1, 
      []
    ], 
    "text/html", 
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36", 
    "https://www.cloudflare.com/", 
    "/cdn-cgi/trace"
  ], 
  [
    "1.2.3.4", 
    8000, 
    "www.example.com", 
    [
      2, 
      []
    ]
  ], 
  [
    238, 
    []
  ], 
  [
    3, 
    []
  ], 
  "192.168.1.1", 
  "metal.cloudflare.com", 
  "10.1.2.3", 
  123456, 
  "10c73629cce30078-LAX"
]

cbor

{
    "timestamp": 2837513946597,
    "zone_id": 123456,
    "zone_plan": null,
    "http": {
        "protocol": null,
        "status": 200,
        "host_status": 503,
        "up_status": 520,
        "method": null,
        "content_type": "text/html",
        "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36",
        "referer": "https://www.cloudflare.com/",
        "request_uri": "/cdn-cgi/trace"
    },
    "origin": {
        "ip": "1.2.3.4",
        "port": 8000,
        "hostname": "www.example.com",
        "protocol": null
    },
    "country": null,
    "cache_status": null,
    "server_ip": "192.168.1.1",
    "server_name": "metal.cloudflare.com",
    "remote_ip": "10.1.2.3",
    "bytes_dlv": 123456,
    "ray_id": "10c73629cce30078-LAX"
}

how to make sure a Vec<u8> is serialized as `bytes`?

I have a structure with a Vec<u8> that I serialize from Rust, using cbor::ser::to_writer_packed

#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct LevelInfo {
    pub name: String,
    pub intro: Vec<Span>,
    pub code: Vec<u8>,
}

in Python, the code field deserializes to something like

 2: [127,
     69,
     76,
     70,
     1,
     1,
     1,
     0,

What I would expect is b'\x7fELF\x01\0x01\0x01\0x00'. It looks that it is using an array of individual values instead.
This is using a lot of extra space, is it possible to avoid this and use cbor's native bytes type?
(sorry if this is a stupid question, could not find anything about annotations!)

Serde 0.8!

Serde v0.8 is now out. We should update cbor to work with it.

Override `is_human_readable`

According to the docs when is_human_readable is false serializer can produce more compact and efficient form. I think this is what expected from cbor serializer. (default is true)

Unfortunately changing it is a breaking change, but I think it's okay in zero-point release.

So I'll make a PR, if you're fine with it?

Alternative would be to make it configurable, but I don't think human-readable things in binary cbor is a good default.

Change `Serialize::Error` to `io::Error`

Unless I'm mistaken, the serializer can't possibly spit out any error but an IO error. To make error handling simpler, it would be nice if Serialize::Error was set to io::Error instead of serde_cbor::Error. One can always just convert it to a serde_cbor::Error if necessary.

Use of forward_to_deserialize_any! leads to surprising visitor calls

I was attempting to (de-)serialize an enum from an integer, as described in https://serde.rs/enum-number.html.

When deserializing, I got the somewhat confusing error message invalid type: integer `1`, expected [My Enum].

The issue is, as I found out later, that cbor uses forward_to_deserialize_any!. The implementation of deserialize_any() then visits the integer type actually encoded in the data rather than the requested / originally encoded type (in my case u8 is visited rather than u32).

This behavior was very surprising to me. I could not find anything conclusive in the serde docs about how the deserializer should behave in this case, however ignoring the explicitly requested type seems wrong to me.

How do deserialize a buffer of unknown size?

I would like to serialize a struct and store it in a file, padded out with zeroes to a fixed length. But when I read the file in, including zeroes, and try to deserialize it, I get a TrailingData error. It seems that serde_cbor::from_vec and serde_cbor::from_reader require the input to have precisely the expected size. Do I have to store the packed structure's size and write that to the file too, outside of the packed structure? That seems redundant. It's also difficult, because to_writer does not return the amount of data written.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.