Giter VIP home page Giter VIP logo

Comments (9)

jemc avatar jemc commented on August 16, 2024 1

I definitely agree with the general idea, but I'd prefer to see a signature like this:

fun box peek_u32_be(seq: ReadSeq[U8] box, offset: USize): U32 ?
fun box peek_u32_le(seq: ReadSeq[U8] box, offset: USize): U32 ?
  1. Using ReadSeq instead of Array is more flexible.
  2. I'm not sure of the use case for a call site that need to use the endian-ness as an argument because it doesn't know the choice statically, and I think that pretty much the entire method body inside the method would be a match block that uses a different strategy based on the argument. In other words, I don't think it benefits the caller or the method implementation to do this, so I think it's an over-generalization of the abstraction. If we really wanted to proceed with this though, I'd prefer to see it as a type parameter rather than a type argument so that we aren't paying a runtime cost for the match and the argument.

I suggest that it could be a great addition to add endian-less function that would pick automatically one of the two method depending on which architecture

I don't think this is actually a common use case? I think pretty much everyone using a feature like this in Pony is using it to read either a network protocol or a file format, which is designed either as big-endian or little-endian as part of the protocol/format specification, regardless of the system architecture it's running on. So I'd be really surprised to see any code "in the wild" that doesn't make a static choice for big-endian vs little-endian.

from rfcs.

jemc avatar jemc commented on August 16, 2024 1

@SeanTAllen - As I understood it, the proposal was not to change the way buffered.Reader works, but to provide a new primitive that has the read-bytes-into-data-types features of buffered.Reader, but without the actual stateful buffering features that buffered.Reader provides.

That is, the idea is that you should be able to, for example, read a big-endian I32 from a particular offset in a ReadSeq[U8] without doing any additional allocations related to buffering and interceding for multiple chunks like buffered.Reader does.

from rfcs.

codec-abc avatar codec-abc commented on August 16, 2024

Totally agree with 1.
For 2, I mostly agree and I think too that in most cases the choice between BigEndian and LittleEndian is known statically and it is better. From a theoretical point of view, it may however happen that a user wish to parse the same data either with BigEndian or LittleEndian depending on a factor known at runtime. It seems that TIFF files can be encoded in both format and a flag tell which one to use. So I guess it may be better that the functions in the std lib can handle that case too. By the way I didn't understood what you mean exactly by:

I'd prefer to see it as a type parameter rather than a type argument

About

I suggest that it could be a great addition to add endian-less function that would pick automatically one of the two method depending on which architecture

To give some context, this is a case where I would have needed it if I wanted to support multiple architecture for the same Pony app:

I am doing a wrapper for SDL2. To deal with events (and emulate polymorphism in C) SDL2 returns a union of different structs. The first byte, give the event type and can be used to cast the union to the correct structure. What I did (by doing the same thing that the Rust wrapper) is generate a byte array big enough to fit all the structs of the union and gave it to SDL2 via FFI. Then, once SDL2 has filled it, I read the first byte to know the event type and then read the content using functions like peek_u32 to retrieve the meaningful event's data from the byte array. Now, If I wanted the same application to run on LittleEndian (say x86/64) and BigEndian (ARM) I would need to check each time on which architecture the code is running to call the correct peek function. However, I do agree that this case is rather uncommon.

To sump up, I think the following functions covers all cases while maintaining a simple API (except maybe for the functions' names) with good performance for the common cases.

// The functions you probably want to use in 99% of cases.
fun box peek_u32_be(seq: ReadSeq[U8] box, offset: USize): U32 ?
fun box peek_u32_le(seq: ReadSeq[U8] box, offset: USize): U32 ?

// A one that just call one of the upper 2 depending of a parameter
type EndianMode is (BigEndian | LittleEndian)
fun box peek_u32(array: ReadSeq[U8] box, endianMode: EndianMode, offset: USize) : U32 ?

// And basically a one that retrieve the processor EndianMode and call the previous one
fun box peek_u32_processor_endian_mode(seq: ReadSeq[U8] box, offset: USize): U32 ?

from rfcs.

jemc avatar jemc commented on August 16, 2024

@codec-abc - Thanks for giving those concrete examples (TIFF and SDL2) to justify the motivation. You convinced me 😉.

I like the idea you shared of providing both variants so that those who know the choice statically pay no extra perf cost, and get a more succinct syntax.

👍


I didn't understood what you mean exactly by:

I'd prefer to see it as a type parameter rather than a type argument

Whoops, that was a bit of a typo. I should have said "I'd prefer to see it as a type parameter rather than a runtime parameter". Something like this:

fun box peek_u32[E: EndianMode](array: Array[U8] box, offset: USize) : U32 ?

However, I'm okay with the approach you shared above in your latest comment, so I don't think that's necessary.

from rfcs.

codec-abc avatar codec-abc commented on August 16, 2024

Since this issue is focused on a very small point it shouldn't be too hard to write a proper RFC (and even the implementation after that) which I will try to do in the near future. But before writing a proper one, I would like your opinion -and anyone interested too- on the following points:

  • Do theses functions belong to the Buffered Reader package or should be put in another package (or file)?
  • What about the names of those functions? Should the word peek be in it? Moreover, peek_u32_processor_endian_mode is rather long but I fear shortening it will reduce readability.
  • A boolean can be used to choose between Big and Little Endian, but I don't like. I prefer the type EndianMode is (BigEndian | LittleEndian) approach because even without named parameters it is clear what is going on at the call site. Plus, using a boolean just saves a very few characters.
  • What about order of parameters and their names? Personally, I like having the offset at the end with a default value of 0.

from rfcs.

SeanTAllen avatar SeanTAllen commented on August 16, 2024

It's unclear to me what the actual changes are. I use Reader in a lot of high performance code so I'm very interested in the actual changes to it and the possible performance ramifications.

I'd rather see this does as a separate package/class that could be eventually brought together if we think that is appropriate.

Ideally, I'd like to see these exist as a 3rd party library first. Evolve some there (as I suspect they are going to be prone to some API changes early on) and then moved over for inclusion in the standard library.

from rfcs.

codec-abc avatar codec-abc commented on August 16, 2024

The changes are to add the following stateless functions to the standard library

fun peek_X_be(seq: ReadSeq[U8] box, offset: USize): X?
fun peek_X_le(seq: ReadSeq[U8] box, offset: USize): X?
fun peek_X[E: EndianMode](array: ReadSeq[U8] box, offset: USize): X?
fun peek_X_processor_endian_mode(seq: ReadSeq[U8] box, offset: USize): X?

for any X in

(U16 | I16 | U32 | I32 | U64 | I64 | U128 | I128 | F32 | F64 | String)

and nothing more. Obviously, The Reader class could use it internally (as it provide more abstraction about reading data from a chunk of bytes) but it is not even necessary.

from rfcs.

SeanTAllen avatar SeanTAllen commented on August 16, 2024

Reader is stateful. If you are proposing adding stateless methods to it, while leaving the rest of Reader stateful, I think that's a confusing API. I think stateless methods such as these would make sense in a primitive of their own.

from rfcs.

codec-abc avatar codec-abc commented on August 16, 2024

I do agree. Also I think it makes sense to add the reverse ones at the same time, ie

fun x_to_U8_be(x: X): ReadSeq[U8]
fun x_to_U8_le(x: X): ReadSeq[U8]
fun x_to_U8[E: EndianMode](x: X): ReadSeq[U8]
fun x_to_U8_processor_endian_mode(x: X): ReadSeq[U8]

from rfcs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.