Giter VIP home page Giter VIP logo

Comments (11)

jcrist avatar jcrist commented on May 12, 2024 4

Thanks for the ping (and the nice benchmark plot). I've actually been revisiting this opinion, and I think you've convinced me that adding simple support for dataclasses is worth it. Now that we have support for TypedDict and NamedTuple objects, asking for dataclass support isn't that far off. And compatibility with orjson is a convincing use case. Users really should use msgspec.Struct objects instead when possible (they're much faster, and have fewer weird edge cases than dataclasses), but we can do a decent job with encoding/decoding dataclasses too.

I spent some time this evening experimenting with an implementation, and I'm pretty happy with the results. Encoding time is already much faster than what orjson provides, especially when slots=True is set:

In [1]: import msgspec, orjson

In [2]: from dataclasses import dataclass

In [3]: enc = msgspec.json.Encoder()

In [4]: @dataclass
   ...: class NoSlots:
   ...:     field_one: int
   ...:     field_two: int
   ...: 

In [5]: @dataclass(slots=True)
   ...: class Slots:
   ...:     field_one: int
   ...:     field_two: int
   ...: 

In [6]: no_slots = [NoSlots(i - 1, i + 1) for i in range(10000)]

In [7]: with_slots = [Slots(i - 1, i + 1) for i in range(10000)]

In [8]: %timeit enc.encode(no_slots)  # msgspec, no slots
561 µs ± 2.04 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [9]: %timeit orjson.dumps(no_slots)  # orjson, no slots
834 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [10]: %timeit enc.encode(with_slots)  # msgspec, with slots
779 µs ± 20 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [11]: %timeit orjson.dumps(with_slots)  # orjson, with slots
3.71 ms ± 90.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [12]: class Struct(msgspec.Struct):
    ...:     field_one: int
    ...:     field_two: int
    ...: 

In [13]: structs = [Struct(i - 1, i + 1) for i in range(10000)]

In [14]: %timeit enc.encode(structs)  # msgspec structs
356 µs ± 307 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

I have JSON encoding done already. Decoding will take a bit more work, but I'd estimate done in the next week or two. I'll reopen this for now as a reminder.

from msgspec.

jcrist avatar jcrist commented on May 12, 2024 3

Ok, #218 is in - msgspec now supports encoding/decoding dataclasses. This came together a lot quicker than I expected, and was pretty fun to work on. They're not as performant or featureful as Struct types, but perforamance isn't bad (especially for encoding).

I have a few other small fixups I'd like to get in, I'd expect a release sometime next week.

from msgspec.

provinzkraut avatar provinzkraut commented on May 12, 2024 1

It's working fantastic! I ran the benchmark again and we can see some real improvements!

rps_serialization

from msgspec.

goodboy avatar goodboy commented on May 12, 2024

Pretty sure you can write a custom codec to handle them as per: https://jcristharif.com/msgspec/extending.html#mapping-to-from-native-types

from msgspec.

jcrist avatar jcrist commented on May 12, 2024

Dataclasses aren't natively supported, but could be handled with a custom enc_hook and dec_hook. However, this will be slow (much slower than using a msgspec.Struct or using a raw dict). While structs and dataclasses have similar apis, its the internal design of struct types that makes them so fast. We could add native support for dataclasses which would put them somewhere between structs and dicts in encode speed (faster than a dict, slower than a struct), but for decoding they'd still be slower than a dict. If possible I recommend people just use structs - they're fast and support most of the things people want in dataclasses.

from msgspec.

old-ocean-creature avatar old-ocean-creature commented on May 12, 2024

they're fast and support most of the things people want in dataclasses

This is true except there is an ecosystem of tooling around dataclasses, adopting msgspec.Struct means disconnecting from that ecosystem

from msgspec.

jcrist avatar jcrist commented on May 12, 2024

What libraries/tools are you thinking of that require dataclasses that you'd also like to use with msgspec?

from msgspec.

provinzkraut avatar provinzkraut commented on May 12, 2024

Hi there. So, while trying to make the switch from orjson to msgspec for Starlite, I encountered this issue. The main concern here is performance really. Dataclasses are natively supported by orjson; Their serialization is therefore quite fast.
What this means for us is, that making the switch to msgspec would be a massive drop in performance for our users when dataclasses are being used.

You can see below a simple benchmark result comparing the branch with orjson and the branch with msgspec.

rsz_rps_serialization

I'm not sure how much work implementing dataclasses support would be, I just wanted to let you know of this use case where dataclass support is kinda vital, so you can consider if you deem it to be worth the effort.

from msgspec.

Goldziher avatar Goldziher commented on May 12, 2024

fantastic! were are eagerly awaiting 😉

from msgspec.

provinzkraut avatar provinzkraut commented on May 12, 2024

@jcrist Thank you for your fantastic work on this and your responsiveness to the issues we raised here in general!

from msgspec.

jakirkham avatar jakirkham commented on May 12, 2024

Thanks Jim! 🙏

from msgspec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.