Giter VIP home page Giter VIP logo

zeropb's People

Contributors

danhhz avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

zeropb's Issues

specialize offsets map

The offsets map is currently implemented with FastIntMap since it was convenient, but FastIntMap has to either be tuned in one general way or else multiple versions code generated to get the nice compiler specializations for all the masks. We could make the happy zero-allocation case more likely if we specialize it at code generation time using our knowledge of the message fields.

For densely packed fields an array is sufficient. It'd be easiest to start with a []uint32 but the values are offsets into a message and so we likely don't need the full range of 4294967296, so that would use more space than necessary. Maybe a followup to make it smaller? We'd have a similar problem to FastIntMap then where we'd have to have all the various sizes we want to use in the zeropb package (or maybe codegen the crazy ones into the generated proto code, eek). We also have to decide how to handle offsets that are larger than whatever range we pick. If we want any valid input to work, we need to fall back to allocating a map and we've basically re-derived FastIntMap (this is exactly why I used it in the first place).

My initial thought is starting with []uint32 and eating the struct size. That's definitely the simplest. It's also not as bad as it sounds. Most of the fields that would be generated other protobuf libraries are 4-8 bytes per field, which means best case is the same size struct and worst case is 2x. Sigh, maybe uint16 really is the way to go, seems like 65536 bytes should be enough. Instead of the map fallback, decode could error if given a buffer longer than max offset, which seems like a reasonable limitation.

If we do specialize densely packed fields, we need a heuristic and also a second impl for non-densely packed fields. My initial idea for the heuristic is if there are no more holes than fields. That is, if max field id <= 2 * num fields. The second impl could initially always be a map (unfortunate but predictable performance). We could add a proto option to let the user override the heuristic if they're okay with the tradeoff.

Both offset impls would be structs that happen to expose the same set of methods. Then the codegen would simply put one or the other in the generated message struct and the rest of the codegen would be the same. No interfaces so no allocations.

There's probably things we could eventually do to improve the second impl if we cared (though I initially do not). It could be an array (of a size computed by some heuristic over the field types) and a slice view of that array (or I guess it'd have to be two of these?). Then we keep all the fields sorted and do everything via binary search, appending to add new parsed fields. That would be allocation free until we got too many fields, which isn't bad. It also doesn't have the unpredictability of FastIntMap's flip from small to large.

support encoding

First, an assumption that encoding should use the same structs as decoding. These structs don't have a native go field per message field, they just have an embedded byte slice and accessors. I think it's possible to do encoding entirely within this slice in response to calling mutator functions. In short, always maintain a serialized version of the message in the buffer (also keeping the offsets up to date).

  • setting a previously unset single field: append the field tag and data to the end of the buffer, update the offsets map
  • unsetting a set single field: copy everything in the buffer forward over the old data and remove from the offset map, adjust all affected offsets in the map
  • setting a previously set single field: if the field tag + data is exactly the same size as the old tag + data, overwrite; if not, unset the field and then proceed as if setting an unset field
  • appending to a repeated field: append the field tag and data to the end of the buffer, set the offsets map if it's unset (if the repeated field was previously empty)
  • unsetting a repeated field: find each instance of the field in the buffer and compact the rest of the buffer over it, adjust all affected offsets in the map
  • bulk overwrite of repeated field: if non-empty, unset; then proceed to append them all as above

Most modifications to simple fields will likely be the same size and get the fast case. Ditto appending to a repeated field. Anything that ends up compacting will be slower, but I'd guess most uses of protobufs don't repeatedly modify the fields anyway.

Unintended benefit of all this: Encode is free!

Note that this even works with unknown fields, which are just passed through unmodified (though perhaps earlier in the buffer than they were.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.