Hi Tomas, I'd like to propose an option in <code class="notranslate"

Might be interesting. What does epb decode <code clas

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

erlang_protobuffs compatibility option,about tomas-abrahamsson/gpb

Comments (26)

tomas-abrahamsson commented on July 30, 2024

Might be interesting. What does epb decode booleans and bytes to? I'm thinking if they are always decoded to true|false and binary() then for the data types, it is mostly a matter of allowing extra formats for encoding, maybe it wouldn't even need to be an option. Perhaps the situation is same or similar for the functions: they could be provided in parallel: encode could just call encode_msg?

from gpb.

tomas-abrahamsson commented on July 30, 2024

By the way, is there any differences with file names? For example the module file end with _pb,erl does the header file end in _pb.hrl too? (I vaguely remember there being some difference in naming for the .erl vs .hrl files for epb but I'm not sure about it)

If there is a difference, then I think it is not very well handled by the mod_suffix options, thus making the case for a compatibility option stronger.

from gpb.

tomas-abrahamsson commented on July 30, 2024

I'll have to read up a bit on the epb interface too, to weigh that in.

from gpb.

lrascao commented on July 30, 2024

from my understanding erlang_protobuffs accepts both strings and binaries as proto strings, won't this also pose an issue for gpb since it either accepts strings or binaries?

from gpb.

tomas-abrahamsson commented on July 30, 2024

@lrascao, for strings, for encoding, gpb actually accepts both strings, io lists and binaries, since it calls unicode:characters_to_binary to turn the indata into an UTF-8 string. On decoding, though, the strings_as_binaries option (-strbin) controls whether to return binaries or strings (lists). (hmm... I now noticed that the type specs does not reflect this... will fix)

Similarly, for float and double, on encoding gpb accepts both floating point values and integers, mostly since the underlying bit syntax N:32/float-little happens to accept N as both float and integers, but on decode, a floating point value is returned.

(From that point of view, 0 and 1 as alternatives on encoding for booleans, and lists as alternative on encoding for bytes would fit an existing pattern)

from gpb.

lukebakken commented on July 30, 2024

Answers to your questions -

What does epb decode booleans and bytes to?

Booleans are decoded to true and false atoms. bytes fields are decoded to binary()

By the way, is there any differences with file names? For example the module file end with _pb,erl does the header file end in _pb.hrl too?

Yep! I used these gbp options when compiling .proto files in riak_pb.

Here is one compatibility issue. epb allows lists as data for a bytes field:

https://github.com/basho/erlang_protobuffs/blob/master/src/protobuffs.erl#L152-L153

I noticed this because Riak has one message that has a bytes field that sometimes receives list data. This field should have been declared as a string. I added a function to "fix up" the data:

https://github.com/basho/riak_pb/blob/features/lrb/use-gpb/src/riak_pb_codec.erl#L406-L412

I see that, in the spec, string and bytes have the same on-the-wire format. I should be able to change bytes to string in this message.

However, other messages use bytes fields and, without a lot of code review, I can't be 100% sure they always have binary() indata. What do you think about adding the ability to use a list as indata for bytes fields in gpb?

from gpb.

tomas-abrahamsson commented on July 30, 2024

I see that, in the spec, string and bytes have the same on-the-wire format. I should be able to change bytes to string in this message.

In the protobuf format (on the wire), a string field must always be UTF-8, whereas bytes don't have to be. For instance, the octets 255,255,255 on the wire is valid as bytes, but not decodable as a string. That value is of course not a realistic errmsg, but if some side encodes an error message in Latin-1, then it is not necessarily decodable as UTF-8, and even if it would be, the text itself might change. if the characters encoded is always UTF-8, or 7-bit ASCII, then it is fine to switch from bytes to string.

from gpb.

tomas-abrahamsson commented on July 30, 2024

For the record, all of the items in the first post are now included in 3.24.0, but keeping this open for a while in case we come to think of anything related.

from gpb.

lukebakken commented on July 30, 2024

Hi Tomas -

I found these functions that erlang_protobuffs creates:

What do you think about doing the same in gpb? Since these are literally the only two places in the Riak codebase that these functions are used, it's not as important as the other epb compatibility functions that were added.

from gpb.

tomas-abrahamsson commented on July 30, 2024

What do these functions do?

from gpb.

lukebakken commented on July 30, 2024

Decoding

riak_core_pb:decode_riakobject_pb(DecodedObject)

That is the same as (using gpb):

riak_core_pb:decode_msg(DecodedObject, riakobject_pb)

Encoding

riak_core_pb:encode_riakobject_pb(#riakobject_pb{bucket=Bucket, key=Key, val=Value})

Is this in gpb:

riak_core_pb:encode_msg(#riakobject_pb{bucket=Bucket, key=Key, val=Value})

These are very much "syntactic sugar" functions that would exist in gpb to provide a "drop in" migration path from epb.

from gpb.

tomas-abrahamsson commented on July 30, 2024

Ah, I see, so for every <Msg>, there's also an encode_<Msg>/1 and a decode_<Msg>/1.

I fired up the epb just now, to actually take a look at the code it generates. For every encode_<Msg>/1, there are two function clauses: one if the argument is a list and one if the argument is a record. The second clause is the equivalent of encode_msg(<Msg>), so it seems straight forward to generate compatibility functions for (but I'd like to meditate a little more over this)

However, the first clause calls an internal helper called delimited_encode. It seems this expects a list, [<Msg>], and encodes this list into Size1+encode(MsgElem1)+Size2+encode(MsgElem2)+... (The + here meaning roughly iolist concatenation.) The only thing in the protobuf encoding spec I can think of for this, is for a field which is repeated and has the option packed. For such a field, the encoded result would look like below, where the tail matches the result of the delimited_encode:

    <field number and wire type> + \
        total size + \
            Size1 + encode(MsgElem1) + \
            Size2 + encode(MsgElem2) + ...

I do not want to write compatibility functions for this first clause, mostly because I find it odd---more below---and also because I think it would complicate the gpb code generator quite a bit to expose this. The oddness I percieve is twofold: not complete, and not general. About not complete: is that it seems to offer the possibility to produce the result that is not a complete decodable unit, the leading two parts are missing. About not general: if we see the Msg as the type, then the encoding can be generalized only to a few other types: also string and bytes would be encoded like this, but the rest (the majoroty) would not, there would be no Size1, Size2 etc.

Similarly, for decoding, there's a decode_delimited_<Msg>, doing the opposite, which I don't really want to support either.

from gpb.

lukebakken commented on July 30, 2024

Tomas, I agree that the delimited encode function that takes a list argument is not worth the work (nor is decode_delimited), and to be honest, I didn't even know they existed!

from gpb.

tomas-abrahamsson commented on July 30, 2024

There's an obvious collision for compatibility encode/decode functions per message, if the message happens to be named just msg: the function name would be encode_msg, but for gpb, that's also the major entry function for encoding. One might argue that the arity differs, but if one would like to add a second argument, Options, then it collides.

(Pre-emptively reasoning: Why would one like to add an Options argument when there's no such thing in epb? Well, for a person that does not have the epb background, it might seem strange that the encode_<Msg> api does not support options, that it is only possible to specify options when using the encode_msg.)

So the question this builds up to: do you think it would be much of a nuisance if the compatibility functions were named encode_msg_<Msg> and not encode_<Msg>? The collision outlined above would be avoided, but on the other hand, one would need to edit code to switch epb to gpb. (Since editing would be needed, maybe this idea of almost-compatibility functions kills the idea -- one could just as well edit to call encode_msg directly instead?)

I'd like the name collision to get a resolution, or else drop the idea. (Not being able to generate code if a message happens to be named msg is not ok.)

Another thought (brain-storming): reduce area of collision by only generate compatibility functions if an option is specified, it would still collide, but I could accept those cases. Cons: other compatibility functions should be also under this option, and maybe a little bit messier code-generating code.

from gpb.

lukebakken commented on July 30, 2024

An epb_compatibility option could just enable the per-message encode / decode case discussed here and the global encode / decode functions that were a part of #65. The boolean and iolist functionality I added could be seen as a general gpb enhancement that also just happens to make gpb more compatible with epb.

from gpb.

tomas-abrahamsson commented on July 30, 2024

Agree. I have something locally for this now

from gpb.

tomas-abrahamsson commented on July 30, 2024

Hi, the 3.24.4 (just pushed) contains the option epb_compatibility (-epb on the command line) which activates generation of the functions encode/decode/encode_<MsgName>/decode_<MsgName>. None of these functions are now generated when no epb_compatibility option is specifeid, so it is an incompatibility with the previous behaviour.

I also added a check at generation-time to see if there's a message named msg, and if so, there's an error that this will collide with the standard encode_msg/decode_msg functions (errors as early as possible).

from gpb.

tomas-abrahamsson commented on July 30, 2024

Revisiting... Now that we have an option (a good thing, I think), maybe this option should also imply:

    {module_name_suffix, "_pb"}

and

    {msg_name_to_lower, true}

to make life easier for any transitioners?

from gpb.

tsloughter commented on July 30, 2024

Hm, I'd like to have the epb_compatibility but it complains that I can't have that and maps together. I mostly just wanted the encode and decode function names since I have a bunch of other code that already uses that but was wanting to have maps support so started looking at gpb. Is there anyway to have both?

from gpb.

tomas-abrahamsson commented on July 30, 2024

For the decode function, things could work straight off with maps. But for the encode function, it would need an extra argument: the name of the message. This is because with maps, the message is just a map of its fields, nothing more, so something is needed to say which message the map is. With such an extra argument to encode, the code would have to be reshaped anyway, or so I reasoned.

(About maps vs records, I don't know if this is a hot potato or not, but here goes: One could have a special '%@type' field or such, to make maps more like records, so that the value also contains its type or purpose, but I think the right thing is to avoid such a type field. No other Erlang value contains its purpose, integers are integers regardless of whether they're used for timeouts or indices, references are references regardless of whether they are for monitoring a process or just created using make_ref(), and so on. Writing programs tend to go just fine without encoding the purpose of a value into it, so I figured this would be how one would use messages as maps too.)

However, the encode_<MsgName> functions could work with maps (but currently don't), but for your use-case, I suppose going that road would, too, require for you to edit the code(?)

from gpb.

tsloughter commented on July 30, 2024

My bad! I actually misunderstood what the maps option was for. I was thinking it was related to if the proto map<_,_> type is decoded to an Erlang map, not whether the encode function takes a map or record. I prefer a record since it is a strict representation anyway :). Thanks!

from gpb.

tomas-abrahamsson commented on July 30, 2024

Ah, ok, no problems, easy enough to confuse the two

from gpb.

tomas-abrahamsson commented on July 30, 2024

In just-pushed 3.25.0, I've updated the epb_compatibility option to also imply
Revisiting... Now that we have an option (a good thing, I think), maybe this option should also imply the two options discussed above:

    {module_name_suffix, "_pb"}
    {msg_name_to_lower, true}

With this, I'm closing this issue, should any of you come to think of some more compatibility thing, just either re-open this issue, or open a new.

from gpb.

lukebakken commented on July 30, 2024

@tomas-abrahamsson thanks! I am checking in after being on vacation for the past few weeks.

from gpb.

lukebakken commented on July 30, 2024

@tomas-abrahamsson epb_compatibility and the other changes you made are great. Thanks so much again.

from gpb.

tomas-abrahamsson commented on July 30, 2024

@lukebakken hehe, you did much of it yourself! Thanks to you too!

from gpb.

erlang_protobuffs compatibility option about gpb HOT 26 CLOSED

Comments (26)

Decoding

Encoding

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent