Comments (26)
Might be interesting. What does epb
decode bool
eans and bytes
to? I'm thinking if they are always decoded to true|false
and binary()
then for the data types, it is mostly a matter of allowing extra formats for encoding, maybe it wouldn't even need to be an option. Perhaps the situation is same or similar for the functions: they could be provided in parallel: encode
could just call encode_msg
?
from gpb.
By the way, is there any differences with file names? For example the module file end with _pb,erl
does the header file end in _pb.hrl
too? (I vaguely remember there being some difference in naming for the .erl
vs .hrl
files for epb
but I'm not sure about it)
If there is a difference, then I think it is not very well handled by the mod_suffix
options, thus making the case for a compatibility option stronger.
from gpb.
I'll have to read up a bit on the epb
interface too, to weigh that in.
from gpb.
from my understanding erlang_protobuffs
accepts both strings and binaries as proto strings
, won't this also pose an issue for gpb since it either accepts strings or binaries?
from gpb.
@lrascao, for strings, for encoding, gpb actually accepts both strings, io lists and binaries, since it calls unicode:characters_to_binary
to turn the indata into an UTF-8 string
. On decoding, though, the strings_as_binaries
option (-strbin
) controls whether to return binaries or strings (lists). (hmm... I now noticed that the type specs does not reflect this... will fix)
Similarly, for float
and double
, on encoding gpb accepts both floating point values and integers, mostly since the underlying bit syntax N:32/float-little
happens to accept N as both float and integers, but on decode, a floating point value is returned.
(From that point of view, 0
and 1
as alternatives on encoding for bool
eans, and lists as alternative on encoding for bytes
would fit an existing pattern)
from gpb.
Answers to your questions -
What does epb decode booleans and bytes to?
Booleans are decoded to true
and false
atoms. bytes
fields are decoded to binary()
By the way, is there any differences with file names? For example the module file end with _pb,erl does the header file end in _pb.hrl too?
Yep! I used these gbp
options when compiling .proto
files in riak_pb
.
Here is one compatibility issue. epb
allows lists as data for a bytes
field:
https://github.com/basho/erlang_protobuffs/blob/master/src/protobuffs.erl#L152-L153
I noticed this because Riak has one message that has a bytes
field that sometimes receives list data. This field should have been declared as a string
. I added a function to "fix up" the data:
https://github.com/basho/riak_pb/blob/features/lrb/use-gpb/src/riak_pb_codec.erl#L406-L412
I see that, in the spec, string
and bytes
have the same on-the-wire format. I should be able to change bytes
to string
in this message.
However, other messages use bytes
fields and, without a lot of code review, I can't be 100% sure they always have binary()
indata. What do you think about adding the ability to use a list as indata for bytes
fields in gpb
?
from gpb.
I see that, in the spec, string and bytes have the same on-the-wire format. I should be able to change bytes to string in this message.
In the protobuf format (on the wire), a string
field must always be UTF-8, whereas bytes
don't have to be. For instance, the octets 255,255,255
on the wire is valid as bytes
, but not decodable as a string
. That value is of course not a realistic errmsg
, but if some side encodes an error message in Latin-1, then it is not necessarily decodable as UTF-8, and even if it would be, the text itself might change. if the characters encoded is always UTF-8, or 7-bit ASCII, then it is fine to switch from bytes
to string
.
from gpb.
For the record, all of the items in the first post are now included in 3.24.0, but keeping this open for a while in case we come to think of anything related.
from gpb.
Hi Tomas -
I found these functions that erlang_protobuffs
creates:
- https://github.com/basho/riak_kv/blob/2.1/src/riak_kv_vnode.erl#L2220
- https://github.com/basho/riak_kv/blob/2.1/src/riak_kv_vnode.erl#L2232
What do you think about doing the same in gpb
? Since these are literally the only two places in the Riak codebase that these functions are used, it's not as important as the other epb
compatibility functions that were added.
from gpb.
What do these functions do?
from gpb.
Decoding
riak_core_pb:decode_riakobject_pb(DecodedObject)
That is the same as (using gpb
):
riak_core_pb:decode_msg(DecodedObject, riakobject_pb)
Encoding
riak_core_pb:encode_riakobject_pb(#riakobject_pb{bucket=Bucket, key=Key, val=Value})
Is this in gpb
:
riak_core_pb:encode_msg(#riakobject_pb{bucket=Bucket, key=Key, val=Value})
These are very much "syntactic sugar" functions that would exist in gpb
to provide a "drop in" migration path from epb
.
from gpb.
Ah, I see, so for every <Msg>
, there's also an encode_<Msg>/1
and a decode_<Msg>/1
.
I fired up the epb
just now, to actually take a look at the code it generates. For every encode_<Msg>/1
, there are two function clauses: one if the argument is a list and one if the argument is a record. The second clause is the equivalent of encode_msg(<Msg>)
, so it seems straight forward to generate compatibility functions for (but I'd like to meditate a little more over this)
However, the first clause calls an internal helper called delimited_encode
. It seems this expects a list, [<Msg>]
, and encodes this list into Size1+encode(MsgElem1)+Size2+encode(MsgElem2)+...
(The +
here meaning roughly iolist concatenation.) The only thing in the protobuf encoding spec I can think of for this, is for a field which is repeated
and has the option packed
. For such a field, the encoded result would look like below, where the tail matches the result of the delimited_encode
:
<field number and wire type> + \
total size + \
Size1 + encode(MsgElem1) + \
Size2 + encode(MsgElem2) + ...
I do not want to write compatibility functions for this first clause, mostly because I find it odd---more below---and also because I think it would complicate the gpb
code generator quite a bit to expose this. The oddness I percieve is twofold: not complete, and not general. About not complete: is that it seems to offer the possibility to produce the result that is not a complete decodable unit, the leading two parts are missing. About not general: if we see the Msg
as the type, then the encoding can be generalized only to a few other types: also string
and bytes
would be encoded like this, but the rest (the majoroty) would not, there would be no Size1
, Size2
etc.
Similarly, for decoding, there's a decode_delimited_<Msg>
, doing the opposite, which I don't really want to support either.
from gpb.
Tomas, I agree that the delimited
encode function that takes a list argument is not worth the work (nor is decode_delimited
), and to be honest, I didn't even know they existed!
from gpb.
There's an obvious collision for compatibility encode/decode functions per message, if the message happens to be named just msg
: the function name would be encode_msg
, but for gpb
, that's also the major entry function for encoding. One might argue that the arity differs, but if one would like to add a second argument, Options
, then it collides.
(Pre-emptively reasoning: Why would one like to add an Options
argument when there's no such thing in epb
? Well, for a person that does not have the epb
background, it might seem strange that the encode_<Msg>
api does not support options, that it is only possible to specify options when using the encode_msg
.)
So the question this builds up to: do you think it would be much of a nuisance if the compatibility functions were named encode_msg_<Msg>
and not encode_<Msg>
? The collision outlined above would be avoided, but on the other hand, one would need to edit code to switch epb
to gpb
. (Since editing would be needed, maybe this idea of almost-compatibility functions kills the idea -- one could just as well edit to call encode_msg
directly instead?)
I'd like the name collision to get a resolution, or else drop the idea. (Not being able to generate code if a message happens to be named msg
is not ok.)
Another thought (brain-storming): reduce area of collision by only generate compatibility functions if an option is specified, it would still collide, but I could accept those cases. Cons: other compatibility functions should be also under this option, and maybe a little bit messier code-generating code.
from gpb.
An epb_compatibility
option could just enable the per-message encode
/ decode
case discussed here and the global encode
/ decode
functions that were a part of #65. The boolean and iolist functionality I added could be seen as a general gpb
enhancement that also just happens to make gpb
more compatible with epb
.
from gpb.
Agree. I have something locally for this now
from gpb.
Hi, the 3.24.4 (just pushed) contains the option epb_compatibility
(-epb
on the command line) which activates generation of the functions encode
/decode
/encode_<MsgName>
/decode_<MsgName>
. None of these functions are now generated when no epb_compatibility
option is specifeid, so it is an incompatibility with the previous behaviour.
I also added a check at generation-time to see if there's a message named msg
, and if so, there's an error that this will collide with the standard encode_msg
/decode_msg
functions (errors as early as possible).
from gpb.
Revisiting... Now that we have an option (a good thing, I think), maybe this option should also imply:
{module_name_suffix, "_pb"}
and
{msg_name_to_lower, true}
to make life easier for any transitioners?
from gpb.
Hm, I'd like to have the epb_compatibility
but it complains that I can't have that and maps
together. I mostly just wanted the encode
and decode
function names since I have a bunch of other code that already uses that but was wanting to have maps support so started looking at gpb. Is there anyway to have both?
from gpb.
For the decode
function, things could work straight off with maps. But for the encode
function, it would need an extra argument: the name of the message. This is because with maps, the message is just a map of its fields, nothing more, so something is needed to say which message the map is. With such an extra argument to encode
, the code would have to be reshaped anyway, or so I reasoned.
(About maps vs records, I don't know if this is a hot potato or not, but here goes: One could have a special '%@type'
field or such, to make maps more like records, so that the value also contains its type or purpose, but I think the right thing is to avoid such a type field. No other Erlang value contains its purpose, integers are integers regardless of whether they're used for timeouts or indices, references are references regardless of whether they are for monitoring a process or just created using make_ref()
, and so on. Writing programs tend to go just fine without encoding the purpose of a value into it, so I figured this would be how one would use messages as maps too.)
However, the encode_<MsgName>
functions could work with maps (but currently don't), but for your use-case, I suppose going that road would, too, require for you to edit the code(?)
from gpb.
My bad! I actually misunderstood what the maps
option was for. I was thinking it was related to if the proto map<_,_>
type is decoded to an Erlang map, not whether the encode function takes a map or record. I prefer a record since it is a strict representation anyway :). Thanks!
from gpb.
Ah, ok, no problems, easy enough to confuse the two
from gpb.
In just-pushed 3.25.0, I've updated the epb_compatibility
option to also imply
Revisiting... Now that we have an option (a good thing, I think), maybe this option should also imply the two options discussed above:
{module_name_suffix, "_pb"}
{msg_name_to_lower, true}
With this, I'm closing this issue, should any of you come to think of some more compatibility thing, just either re-open this issue, or open a new.
from gpb.
@tomas-abrahamsson thanks! I am checking in after being on vacation for the past few weeks.
from gpb.
@tomas-abrahamsson epb_compatibility
and the other changes you made are great. Thanks so much again.
from gpb.
@lukebakken hehe, you did much of it yourself! Thanks to you too!
from gpb.
Related Issues (20)
- There are many deficiencies HOT 6
- Hitting `no case clause matching: :group_end` in Elixir app using Exprotobuf/gpb HOT 8
- Add preprocessor check around no_underspec? HOT 3
- Optional added back to proto3 HOT 3
- Outdated example HOT 3
- Unset `google.protobuf.StringValue` map values are decoded as empty strings instead of empty values HOT 3
- proposal for performance improvements HOT 9
- Performance improvement for encoding protos with unchanged data HOT 7
- -spec for enum generated code is incorrect HOT 3
- For gpb 5: Drop support for Erlang versions before 19
- How can i call gpb from my rebar.config? HOT 2
- [enhancement] support for gpb text format HOT 1
- I think the generated file is too big HOT 7
- -mapfields-as-maps and -json possible incompability HOT 2
- Trying to ecode a float into a uint64 field results in a badarith exception instead of gpb_type_error HOT 2
- Enums are not defined as macros HOT 7
- Fix for warning on float 0.0 HOT 5
- to_json encodes `uint64` as an integer instead of a string HOT 1
- Add a @generated tag HOT 1
- Excessive File Size in Generated Protobuf Files HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpb.