pyfisch / cbor Goto Github PK
View Code? Open in Web Editor NEWCBOR support for serde.
Home Page: https://docs.rs/serde_cbor/
License: Apache License 2.0
CBOR support for serde.
Home Page: https://docs.rs/serde_cbor/
License: Apache License 2.0
I need to serialize bytes longer than 0.5Mb. Can the constant be customized or just dropped?
Currently there seems to be no way of specifying that a field of a struct correspond to a numeric key. I am not sure whether this would require an additional annotation, since even using #[serde(rename = "1")]
would be ambiguous between the string "1" and the integer value 1.
I tried to use serde_cbor::de::from_reader
to deserialize into a &str
and got the surprising runtime error message like this:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorImpl { code: Message("invalid type: string \"foobar\", expected a borrowed string"), offset: 0 }', libcore/result.rs:945:5
It makes sense that from_reader
can't do zero-copy like from_slice
since there's no source buffer. However, attempting to do so should cause a compile-time failure, not a runtime failure. docopt had a similar bug: docopt/docopt.rs#222
I'm using serde_cbor in a linux fifo to pass commands from one daemon to another. When the receiving daemon is not fast enough and two structures are put into the fifo, cbor fails with "TrailingBytes".
Therefore it would be helpful to have a method that returns the size of bytes that have already been decoded. With that information, i.e. an iterator could be implemented.
Maybe such functionality is already in the pipeline?
serde + bincode serialization and deserialization work for the attached file, but serde_cbor fails at deserialization with the following error message:
error: "trailing bytes" at byte position 11
Because serde doesn't provide field indices when serializing, the serializer currently guesses the index based off of the number of fields serialized. This doesn't work when fields are conditionally skipped, however:
extern crate serde;
extern crate serde_cbor;
#[macro_use]
extern crate serde_derive;
#[derive(Serialize, Deserialize)]
struct Foo {
a: u32,
#[serde(skip_serializing_if = "Option::is_none")]
b: Option<u32>,
c: u32,
}
#[test]
fn foo() {
let foo = Foo {
a: 0,
b: None,
c: 1,
};
let buf = serde_cbor::ser::to_vec_packed(&foo).unwrap();
serde_cbor::from_slice::<Foo>(&buf).unwrap();
}
running 1 test
test foo ... FAILED
failures:
---- foo stdout ----
thread 'foo' panicked at 'called `Result::unwrap()` on an `Err` value: Custom("missing field `c`")', /checkout/src/libcore/result.rs:859
note: Run with `RUST_BACKTRACE=1` for a backtrace.
failures:
foo
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured
error: test failed, to rerun pass '--lib'
Completed with code 101
How do I use byte strings with serde_cbor?
I want "64.71.168.211".parse::<std::net::Ipv4Addr>().unwrap().octets()
to be 54 40 47 a8 d3
, not 84 18 40 18 47 18 a8 18 d3
.
The API changes are pretty significant, looks like.
I can tackle this if you won't have a chance, @pyfisch.
#[derive(Debug)]
#[derive(Serialize, Deserialize)]
struct Bar<'a>(&'a [u8]);
#[derive(Debug)]
#[derive(Serialize, Deserialize)]
struct Bar2<'a>(&'a str);
fn main() {
let bar = Bar2("123");
let c = serde_cbor::to_vec(&bar).unwrap();
let m: Bar2 = serde_cbor::from_slice(&c).unwrap();
// ^ work
let bar = Bar(&[1, 2, 3]);
let c = serde_cbor::to_vec(&bar).unwrap();
let m: Bar = serde_cbor::from_slice(&c).unwrap();
// ^ panic
// thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorImpl { code: Message("invalid type: sequence, expected a borrowed byte array"), offset: 0 }', libcore/result.rs:945:5
}
cbor can be deserialize &str
, but not deserialize &[u8]
. I think it's a bug.
Check out the release notes here: https://github.com/serde-rs/serde/releases/tag/v0.9.0
Unlike in JSON, I expect std::net::Ipv4Addr
to be 5 bytes in CBOR...
This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic
on IRC to discuss.
You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.
TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.
The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.
Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.
To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright, due to not being a "creative
work", e.g. a typo fix) and then add the following to your README:
## License
Licensed under either of
* Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
* MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
at your option.
### Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.
and in your license headers, if you have them, use the following boilerplate
(based on that used in Rust):
// Copyright 2016 cbor developers
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.
It's commonly asked whether license headers are required. I'm not comfortable
making an official recommendation either way, but the Apache license
recommends it in their appendix on how to use the license.
Be sure to add the relevant LICENSE-{MIT,APACHE}
files. You can copy these
from the Rust repo for a plain-text
version.
And don't forget to update the license
metadata in your Cargo.toml
to:
license = "MIT/Apache-2.0"
I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!
To agree to relicensing, comment with :
I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.
Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.
The program crashes when trying to serialize them, complaining i/u128 is not supported
Serde itself already has support for that
extern crate serde_cbor;
fn main() {
let obj1 = Some(10u32);
let mut v = vec![];
assert!(serde_cbor::ser::to_writer(&mut v, &obj1).is_ok());
println!("{:?}", v);
let obj2: Result<Option<u32>, _> = serde_cbor::de::from_reader(&v[..]);
assert_eq!(obj1, obj2.unwrap());
}
This code fails with
thread '<main>' panicked at 'called
Result::unwrap()on an
Err value: SyntaxError(invalid syntax: "incorrect type", 0)', ../src/libcore/result.rs:738
I have code in production that fails with "trailing bytes" when I have a structure whose last element is an Option; I haven't produced a small test case because it's such a PITA to write Serialize/Deserialize impls for structures.
Hi,
I'm going to release a crate based on this one. And for that crate, it's crucial to support larger sequences.
So would mind releasing current master to crates.io?
In the current draft, the bundle protocol [0] uses indefinite-length arrays in its binary representation.
While deserialization of those was easy, I have yet to find a way to serialize a data structure into
that format.
[0] https://tools.ietf.org/html/draft-ietf-dtn-bpbis-11#section-4.2.1
Let's say I have this code:
extern crate serde_cbor;
use std::collections::HashMap;
use serde_cbor::{from_slice, to_vec};
fn main() {
let mut data = HashMap::new();
data.insert(42, "Hello");
let encoded = to_vec(&data).expect("Failed to encode");
let decoded: HashMap<usize, String> = from_slice(&encoded[..encoded.len() - 1]).expect("Failed to decode");
}
The last line correctly fails, as the last byte is missing. However, this is the error I get:
Failed to decode: Io(Error { repr: Custom(Custom { kind: UnexpectedEof, error: StringError("failed to fill whole buffer") }) })
Looking at the documentation, I'd expect the error to be Eof
, not Io(Custom(… io-based-eof…))
. To be fair, the documentation doesn't explicitly say that all EOF-based errors are going to be reported as Eof
, but it was still a bit surprise.
Rewrite the main docs page and the README with an example better reflecting the uses of this crate and of CBOR. The unmaintained label should go.
Thanks to the rewrite by @sfackler the crate is now in a good shape. I have opened #45 and #46 for specific things that should be improved.
@sfackler: How do you want to adopt the crate? You are already registered as a contributor do you want to transfer the repo? And do you need access to the crates.io account?
I have attached the API guidelines checklist, I will try to check it myself but help is appreciated.
cc @jq-rs, @arthurprs
as_
, to_
, into_
conventions (C-CONV)iter
, iter_mut
, into_iter
(C-ITER)Copy
, Clone
, Eq
, PartialEq
, Ord
, PartialOrd
, Hash
, Debug
,Display
, Default
From
, AsRef
, AsMut
(C-CONV-TRAITS)FromIterator
and Extend
(C-COLLECT)Serialize
, Deserialize
(C-SERDE)Send
and Sync
where possible (C-SEND-SYNC)Hex
, Octal
, Binary
formatting (C-NUM-FMT)R: Read
and W: Write
by value (C-RW-VALUE)?
, not try!
, not unwrap
(C-QUESTION-MARK)Deref
and DerefMut
(C-DEREF)bool
or Option
(C-CUSTOM-TYPE)bitflags
, not enums (C-BITFLAG)Debug
(C-DEBUG)Debug
representation is never empty (C-DEBUG-NONEMPTY)The packed format for structs uses the order of the calls to serialize_field
to decide on the id of the field for the map.
This is very fragile, and seems to break with the current serde-codegen generated code (sorry I don't have an example here, it's hard to extract the from my current project). I've had cases where a simple to_writer_packed
/ from_reader
failed.
A more robust approach would be to serialize the fields as a sequence instead, as this makes it obvious that the order matters. Or look up the field name when serializing to determine the index in the struct fields array.
#54 (which attempted to fix #51 and #52) actually broke deserialization of Ipv4Addr (or, really, any other human-readability-dependant format) - Deserializer
must have is_human_readable
returning the same value as serializer.
Right now, if you try to serialize Ipv4Addr
, it will serialize it as sequence of bytes, but will try to deserialize as a string, resulting in an error:
called `Result::unwrap()` on an `Err` value: ErrorImpl { code: Message("invalid type:sequence, expected a string"), offset: 0 }
extern crate serde_cbor;
extern crate serde_bytes;
use std::io::Cursor;
use serde_bytes::ByteBuf;
fn main() {
let input = ByteBuf::from(vec![0u8; 2048 * 1024]);
let cbor = serde_cbor::to_vec(&input).unwrap();
let output: ByteBuf = serde_cbor::from_reader(Cursor::new(cbor.as_slice())).unwrap();
assert_eq!(input.len(), output.len());
}
I would expect input
and output
to have the same length, but instead output
is only 16KiB long.
It works fine though using serde_cbor::from_slice
instead of serde_cbor::from_reader
.
This is related to #61 but not the same -- I want to deserialize CBOR values as they stream in, without throwing away the rest of the buffer; with serde_json
, I do:
fn decode(&mut self) -> Result<(), Error> {
({
let de = Deserializer::from_slice(&self.recv_bytes_queue);
let mut s = de.into_iter();
loop {
match s.next() {
Some(Ok(value)) => self.recv_value_queue.push_front(value),
Some(Err(err)) => {
if err.is_eof() {
break Ok(Some(s.byte_offset()));
} else {
break Err(err.into());
}
}
None => break Ok(None),
}
}
}).map(|l| match l {
Some(l) => {
// TODO: Make this more efficient.
self.recv_bytes_queue.drain(..l);
}
None => self.recv_bytes_queue.clear(),
})
}
However, no equivalent StreamDeserializer.byte_offset()
exists for serde_cbor
.
This input is correct CBOR that is deserialized in 0.5.0, but not 0.5.1:
bf 67 6d 65 73 73 61 67 65 64 70 6f 6e 67 ff
It should translate to
{
"message": "pong"
}
Fails at I believe this line with a value of 7: https://github.com/pyfisch/cbor/blob/master/src/de.rs#L296
Hello. I'm what to emit Tag(0) (RFC 3339 date-time string), but i can't find any hints.
I am deserializing multiple objects from a slice (like in #20).
I am using the EOF error to detect when I reached the last object, but I have no way to distinguish between an actual EOF and an invalid cbor object.
It would be great to have an Iter
implementation that does the right thing or that the thrown error had not the current offset in the stream but the offset where the deserialization began.
Serde supports no_std nicely (judging from the documentation); it would be nice if serde_cbor could too.
I'm currently looking around for what parts of the crate would need to be conditional on std's presence, so basically whether cfg(feature=...) gating; will keep this issue updated.
Currently the serde_cbor errors are not very helpful, e.g.
Custom("missing field `name`")
— no idea in what object.
A "long form" empty map will not deserialize:
extern crate serde_cbor;
fn main() {
let _: () = serde_cbor::from_slice(&[191, 255]).unwrap();
}
thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: TrailingBytes', ../src/libcore/result.rs:746
note: Run with `RUST_BACKTRACE=1` for a backtrace.
I'm interested in enums(either holding arguments or not) representing the enum type as integer - similiar to https://crates.io/crates/protocol - Is this possible?
Remove travis-cargo and build this crate like described in the Travis CI Rust tutorial.
Note: We do not need to build the docs anymore since there is now docs.rs.
Please update crate to fix actual changes
I wonder if we can have support for Enums in a similar manner as serde_json does. Not sure if this will be breaking standards though. This would be a great addition.
I use bincode in one of my programs, and I'm interested in switching to CBOR to make interoperability with other languages easier. One blocker that I have at the moment is the difficulty of quickly reading a small header struct in a very large file (2–4 GiB). Consider these data structures:
#[derive(Serialize, Deserialize)]
struct Header {
magic: u32,
timestamp: i64
}
#[derive(Serialize, Deserialize)]
struct Bucket {
header: Header,
data: HashMap<String, Info>
}
If I serialize a Bucket
to disk using bincode, I can then deserialize just the header:
bincode::serialize_into(&mut writer, &my_bucket);
let my_header: Header = bincode::deserialize_from(&mut reader).unwrap();
With serde_cbor, I need to read the entire file to get at the header, and that makes the program unacceptably slow. I thought about serializing the header and the HashMap one after the other:
serde_cbor::to_writer(&mut writer, &my_header);
serde_cbor::to_writer(&mut writer, &my_data);
Unfortunately, this doesn't work: when I try to deserialize just the header, the deserializer asserts that the reader has been read entirely.1
Would it be possible to either remove this end-of-stream check or to add another deserializer function that doesn't return an error when there is still data in the buffer?
Hi, it'd be nice to use (have an option to use) a compact/packed serialization format (like bincode and rmp) that serializes structs like lists (thus relying on the order of the fields). This allows faster (de)serialization and more compact output.
rmp (msgpack)
[
2837513946597,
123456,
[
1,
[]
],
[
[
2,
[]
],
200,
503,
520,
[
1,
[]
],
"text/html",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36",
"https://www.cloudflare.com/",
"/cdn-cgi/trace"
],
[
"1.2.3.4",
8000,
"www.example.com",
[
2,
[]
]
],
[
238,
[]
],
[
3,
[]
],
"192.168.1.1",
"metal.cloudflare.com",
"10.1.2.3",
123456,
"10c73629cce30078-LAX"
]
cbor
{
"timestamp": 2837513946597,
"zone_id": 123456,
"zone_plan": null,
"http": {
"protocol": null,
"status": 200,
"host_status": 503,
"up_status": 520,
"method": null,
"content_type": "text/html",
"user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36",
"referer": "https://www.cloudflare.com/",
"request_uri": "/cdn-cgi/trace"
},
"origin": {
"ip": "1.2.3.4",
"port": 8000,
"hostname": "www.example.com",
"protocol": null
},
"country": null,
"cache_status": null,
"server_ip": "192.168.1.1",
"server_name": "metal.cloudflare.com",
"remote_ip": "10.1.2.3",
"bytes_dlv": 123456,
"ray_id": "10c73629cce30078-LAX"
}
I have a structure with a Vec<u8>
that I serialize from Rust, using cbor::ser::to_writer_packed
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct LevelInfo {
pub name: String,
pub intro: Vec<Span>,
pub code: Vec<u8>,
}
in Python, the code
field deserializes to something like
2: [127,
69,
76,
70,
1,
1,
1,
0,
What I would expect is b'\x7fELF\x01\0x01\0x01\0x00'
. It looks that it is using an array of individual values instead.
This is using a lot of extra space, is it possible to avoid this and use cbor's native bytes type?
(sorry if this is a stupid question, could not find anything about annotations!)
Any chance you could update this to support serde 0.7?
Hey. Can you add support for converting data to other formats, as described here?
https://serde.rs/transcode.html
Serde v0.8 is now out. We should update cbor to work with it.
https://serde.rs/enum-representations.html
Serde CBOR should use the "Externally tagged" format by default in the next breaking version.
Also make sure that the other formats round-trip.
The current array based serialization should be kept as a legacy format so old data can be read.
According to the docs when is_human_readable
is false
serializer can produce more compact and efficient form. I think this is what expected from cbor serializer. (default is true
)
Unfortunately changing it is a breaking change, but I think it's okay in zero-point release.
So I'll make a PR, if you're fine with it?
Alternative would be to make it configurable, but I don't think human-readable things in binary cbor is a good default.
See https://tools.ietf.org/html/rfc7049#section-3.9.
This makes it possible to compare objects for bitwise equality (it also makes signatures/signature verification easier).
Unless I'm mistaken, the serializer can't possibly spit out any error but an IO error. To make error handling simpler, it would be nice if Serialize::Error
was set to io::Error
instead of serde_cbor::Error
. One can always just convert it to a serde_cbor::Error
if necessary.
Parse bytestrings of indefinite length without using the ByteBuf wrapper.
Also refactor parsing indefinite length UTF8-strings.
It would be nice if unknown simple values would be parsed and if custom simple values could be serialized. This is low priority as there are currently no simple values registered besides those from RFC 7049 which are supported.
Additionally Null and Undefined should have different deserializations.
I was attempting to (de-)serialize an enum from an integer, as described in https://serde.rs/enum-number.html.
When deserializing, I got the somewhat confusing error message invalid type: integer `1`, expected [My Enum]
.
The issue is, as I found out later, that cbor
uses forward_to_deserialize_any!
. The implementation of deserialize_any()
then visits the integer type actually encoded in the data rather than the requested / originally encoded type (in my case u8
is visited rather than u32
).
This behavior was very surprising to me. I could not find anything conclusive in the serde docs about how the deserializer should behave in this case, however ignoring the explicitly requested type seems wrong to me.
I would like to serialize a struct and store it in a file, padded out with zeroes to a fixed length. But when I read the file in, including zeroes, and try to deserialize it, I get a TrailingData
error. It seems that serde_cbor::from_vec
and serde_cbor::from_reader
require the input to have precisely the expected size. Do I have to store the packed structure's size and write that to the file too, outside of the packed structure? That seems redundant. It's also difficult, because to_writer
does not return the amount of data written.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.