datafusion-contrib / datafusion-substrait Goto Github PK
View Code? Open in Web Editor NEWExperimental support for serializing DataFusion plans using substrait
License: Apache License 2.0
Experimental support for serializing DataFusion plans using substrait
License: Apache License 2.0
This might beed to be split into separate issues for scalar, IN, EXISTS subqueries
Hi, this is just a question to get more clarity on the structure of this crate.
If I'm not mistaken this crate shares some similarities with the datafusion-proto crate. The datafusion-proto crate extensively uses the From trait for the conversion between the native datafusion representation and the protobuf representation.
I was wondering why the current implementation for datafusion-substrait uses a different approach? I thought it might be beneficial if both crates share a similar structure. Since the conversion to substrait might impose certain errors, the From trait might not be applicable but the TryFrom trait could still be used.
The current implementation uses async functions for the conversion from substrait to datafusion. Might this be the reason. However, I don't see why the async functions are currently necessary.
Thanks for the help.
Literal Type
One gap is unsigned primitives are not in Substrait's definition. In this discussion they were cataloged as "third party extension defined types". But it's widely used in Arrow and DataFusion. I'd like to discuss how they would be integrated here.
For types we can use type extension, as I do in greptimedb https://github.com/GreptimeTeam/greptimedb/blob/8959dbcef83507ccd76aaaffd2b44cab6426e68f/src/common/substrait/src/types.rs, I occupy the "1" variations for those types (I8
, I16
, I32
, I64
) and translate them to the unsigned version. I think we can do it this way here.
Kind::I8(desc) => substrait_kind!(desc, int8_datatype, uint8_datatype),
Kind::I16(desc) => substrait_kind!(desc, int16_datatype, uint16_datatype),
Kind::I32(desc) => substrait_kind!(desc, int32_datatype, uint32_datatype),
Kind::I64(desc) => substrait_kind!(desc, int64_datatype, uint64_datatype),
For literals we can do it similarly, but one difference is they also ship the data values. I prepare to transmute
between the signed and unsigned values. But not sure if this is the best way we can achieve it. As I investigated, duckdb also doesn't cover this part https://github.com/duckdblabs/substrait/blob/a08fe49de78926a3112daddcb64d45a12e43950a/substrait/from_substrait.cpp#L94-L102
case substrait::Expression_Literal::LiteralTypeCase::kI8:
dval = Value::TINYINT(slit.i8());
break;
case substrait::Expression_Literal::LiteralTypeCase::kI32:
dval = Value::INTEGER(slit.i32());
break;
case substrait::Expression_Literal::LiteralTypeCase::kI64:
dval = Value::BIGINT(slit.i64());
break;
Development has moved to the core DataFusion repo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.