Giter VIP home page Giter VIP logo

Comments (11)

akvadrako avatar akvadrako commented on September 15, 2024

and formatted more nicely:

class FastReader(Construct):
    def _parse(self, stream, context):
        return stream.read()

    def _build(self, obj, stream, context):
        stream.write(obj)

SegBody = Struct(None,
        UBInt16('size'),
        Field('data', lambda ctx: ctx['size'] - 2),
    )   

Seg = Struct('seg',
        Literal('\xff'),
        Byte('kind'),
        Switch('body', lambda c: c['kind'],
            {
                SOS: FastReader('data'),
            },  
            default = Embed(SegBody),
            )
        )   

JPEG = Struct('jpeg',
        Literal('\xff\xd8'),
        GreedyRange(Seg),
        )

from construct.

MostAwesomeDude avatar MostAwesomeDude commented on September 15, 2024

Hi,

I'm not sure about the FastReader, as I still don't grok that section of Construct yet.

There is a PascalString, in construct.macros, which takes a length_field as a kwarg. An example usage:

>>> from construct import PascalString, UBInt16
>>> s = PascalString("hurp", length_field=UBInt16("length"))
>>> s.parse("\x00\x05Hello")
'Hello'

Thanks for your comments. Let me know if you have any patches you wish to contribute.

from construct.

akvadrako avatar akvadrako commented on September 15, 2024

Hi - the issue with the PascalString is that the length field doesn't include the bytes that make up the length field. In several protocols, we get fields like this, 0x0004babe, so the length (4) include the first 2 bytes.

from construct.

tomerfiliba avatar tomerfiliba commented on September 15, 2024

@akvadrako: this could be done like so

>>> s=PascalString("data", ExprAdapter(ULInt16("length"), 
...    lambda val, ctx: val + 2, lambda val, ctx: val - 2))
>>> s.parse("\x05\x00helloxxxx")
'hel'
>>> s.build("foo")
'\x05\x00foo'

on the other hand, your straight forward solution is better.

as per your FastReader class -- i would consider it bad design. i understand you simply wanted to read everything in, but it's not predictable (can't tell how much it will read or write) and thus not symmetric. for instance, the following construct would work only in one direction:

Struct("a", 
    FastReader("blob"),
    UBInt32("x"),
)

you would be able to build anything you want, but you'll never be able to parse it back.

from construct.

akvadrako avatar akvadrako commented on September 15, 2024

I suggested a variant to PascalString because length+data is common in network protocols and apparently JPEG too.

FastReader is the best we can do with construct's internals. Your example wouldn't work with RepeatUntil and Range either. I'm not sure it should - since constructs need to know about future constructs and you'll get ambiguity:

Struct("a", 
    GreedyRange("b"),
    GreedyRange("c"),
)

Probably better to make a FastReadUntil('BOUNDARY').

from construct.

MostAwesomeDude avatar MostAwesomeDude commented on September 15, 2024

Length + data is perfectly serviced by PascalString; the case where the length of the length is included in the length is actually rather uncommon though. Maybe a new String subclass is needed for it.

As far as "fast" reading, why not examine other optimizations first? There are optimization opportunities in Construct core, I think.

from construct.

tomerfiliba avatar tomerfiliba commented on September 15, 2024

@MostAwesomeDude: no need to subclass, it would be much simpler to just define a InclusivePascalString "macro" that takes care of subtracting/adding the size of the length field from the length.

@akvadrako: your "fast" reader isn't any faster than the plain old Field except that it doesn't check the length. since this greedy construct can only appear once at the end of a data structure, it don't suppose it would make much difference in terms of speed. also, my tests back in the day showed that psycho can speed up parsing by a tenfold.

on the other hand, as you said, it poses a problem of breaking the symmetry between parsing and building... but i think it's inherent to the pattern and there isn't any real solution.

from construct.

akvadrako avatar akvadrako commented on September 15, 2024

it's much faster - construct is unusable for parsing JPEG images without it - where 99% of the data is an unbounded blob at the end of the file.

from construct.

tomerfiliba avatar tomerfiliba commented on September 15, 2024

if you're using GreedyRange, then yes, it would be much faster. i was talking about Field. on the other hand, Field must have a predetermined length, so it's not suitable for your purpose.

what do you mean, though, that 99% of the file is a blob? doesn't it have an internal structure? if so, i assume you have no real interest in it, so you may want to use OnDemand, so it will actually be read only when asked for.

from construct.

akvadrako avatar akvadrako commented on September 15, 2024

Yes, you are correct. OnDemand doesn't help though, because it requires a known length.

from construct.

tomerfiliba avatar tomerfiliba commented on September 15, 2024

well, i just had an idea: assuming you're working on a file/stringIO, you can write a construct that simply returns the remaining length till EOF. e.g.

p=stream.tell()
stream.seek(0, 2)
p2=stream.tell()
stream.seek(p)
return p2-p

and then you could combine it with Field and OnDemand.

from construct.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.