Comments (11)
Thanks for the ping (and the nice benchmark plot). I've actually been revisiting this opinion, and I think you've convinced me that adding simple support for dataclasses is worth it. Now that we have support for TypedDict
and NamedTuple
objects, asking for dataclass support isn't that far off. And compatibility with orjson
is a convincing use case. Users really should use msgspec.Struct
objects instead when possible (they're much faster, and have fewer weird edge cases than dataclasses), but we can do a decent job with encoding/decoding dataclasses too.
I spent some time this evening experimenting with an implementation, and I'm pretty happy with the results. Encoding time is already much faster than what orjson provides, especially when slots=True
is set:
In [1]: import msgspec, orjson
In [2]: from dataclasses import dataclass
In [3]: enc = msgspec.json.Encoder()
In [4]: @dataclass
...: class NoSlots:
...: field_one: int
...: field_two: int
...:
In [5]: @dataclass(slots=True)
...: class Slots:
...: field_one: int
...: field_two: int
...:
In [6]: no_slots = [NoSlots(i - 1, i + 1) for i in range(10000)]
In [7]: with_slots = [Slots(i - 1, i + 1) for i in range(10000)]
In [8]: %timeit enc.encode(no_slots) # msgspec, no slots
561 µs ± 2.04 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [9]: %timeit orjson.dumps(no_slots) # orjson, no slots
834 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [10]: %timeit enc.encode(with_slots) # msgspec, with slots
779 µs ± 20 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [11]: %timeit orjson.dumps(with_slots) # orjson, with slots
3.71 ms ± 90.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [12]: class Struct(msgspec.Struct):
...: field_one: int
...: field_two: int
...:
In [13]: structs = [Struct(i - 1, i + 1) for i in range(10000)]
In [14]: %timeit enc.encode(structs) # msgspec structs
356 µs ± 307 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
I have JSON encoding done already. Decoding will take a bit more work, but I'd estimate done in the next week or two. I'll reopen this for now as a reminder.
from msgspec.
Ok, #218 is in - msgspec
now supports encoding/decoding dataclasses. This came together a lot quicker than I expected, and was pretty fun to work on. They're not as performant or featureful as Struct
types, but perforamance isn't bad (especially for encoding).
I have a few other small fixups I'd like to get in, I'd expect a release sometime next week.
from msgspec.
It's working fantastic! I ran the benchmark again and we can see some real improvements!
from msgspec.
Pretty sure you can write a custom codec to handle them as per: https://jcristharif.com/msgspec/extending.html#mapping-to-from-native-types
from msgspec.
Dataclasses aren't natively supported, but could be handled with a custom enc_hook
and dec_hook
. However, this will be slow (much slower than using a msgspec.Struct
or using a raw dict
). While structs and dataclasses have similar apis, its the internal design of struct types that makes them so fast. We could add native support for dataclasses which would put them somewhere between structs and dicts in encode speed (faster than a dict, slower than a struct), but for decoding they'd still be slower than a dict. If possible I recommend people just use structs - they're fast and support most of the things people want in dataclasses.
from msgspec.
they're fast and support most of the things people want in dataclasses
This is true except there is an ecosystem of tooling around dataclasses, adopting msgspec.Struct
means disconnecting from that ecosystem
from msgspec.
What libraries/tools are you thinking of that require dataclasses that you'd also like to use with msgspec?
from msgspec.
Hi there. So, while trying to make the switch from orjson to msgspec for Starlite, I encountered this issue. The main concern here is performance really. Dataclasses are natively supported by orjson; Their serialization is therefore quite fast.
What this means for us is, that making the switch to msgspec would be a massive drop in performance for our users when dataclasses are being used.
You can see below a simple benchmark result comparing the branch with orjson and the branch with msgspec.
I'm not sure how much work implementing dataclasses support would be, I just wanted to let you know of this use case where dataclass support is kinda vital, so you can consider if you deem it to be worth the effort.
from msgspec.
fantastic! were are eagerly awaiting 😉
from msgspec.
@jcrist Thank you for your fantastic work on this and your responsiveness to the issues we raised here in general!
from msgspec.
Thanks Jim! 🙏
from msgspec.
Related Issues (20)
- UnboundLocalError for `new_scope` HOT 3
- Sign release tags when making future releases? HOT 4
- How can I do data normalization on a frozen instance in `__post_init__`? HOT 1
- Cannot convert with `from_attributes` when using a rename convention HOT 3
- Cannot set `gc=False` on Generic structs? HOT 5
- Support types.MappingProxyType HOT 3
- Add either `init_omit_defaults` or `omit_none` HOT 5
- Consider making `DecodeError` and `ValidationError` inherit from `ValueError` HOT 1
- Docs page on testing
- json schema generation - differences between pydantic and msgspec HOT 3
- Allow conversion to collection from generator HOT 2
- Porting guide for users coming from `orjson`
- Converting dicts into list with key-reuse HOT 3
- Collecting multiple validation/constraint errors at once HOT 1
- Allow `omit_defaults` to exclude fields when encoded value is `{}` (empty dict)
- Duplicate key detection
- Allow unknown tags, defaulting to tagged base
- Implementing optional bytes type for json. HOT 1
- Update annotation parsing to work with PEP 649 in Python 3.13
- `omit_defaults` does not omit tuples and frozensets HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from msgspec.