Comments (3)
Hey @rupurt!
I'm not planning to support Apache Arrow nor port OctoSQL to a vectorised execution engine. It's not worth the effort. Re the thesis - it's old and OctoSQL has been rewritten from scratch since. It's around 100x faster now, due to the static typing and how the execution phase now works.
In general, if you're working with data where the speedup of a columnar engine would be worth it, just use https://github.com/apache/arrow-datafusion or a project built around it. It has much more manpower behind it. Arrow is a PITA to code around, especially when you want union types, repetition, and deeply nested data structures. Additionally, the Go Arrow library is way behind the Rust one (or others).
To answer your last question, if you'd like to port OctoSQL to Arrow, please fork. As far as improvements go, they're welcome! However, please first create issues to discuss the details of the contributions.
If you'd like to attempt this redesign on your own, your best bet is to keep the physical phase but rewrite the execution phase almost completely. Here's an experiment you can use as inspiration: https://github.com/cube2222/octosql/tree/vectorization-experiment2
from octosql.
Thank you for the info and leads @cube2222. I definitely noticed that you added a dataflow engine which is one of the reasons I'm stoked to comprehend and work with your project!
I'm totally with you in regards to datafusion having a larger community and more progress. But my personal belief is that no one has really nailed the serving layer for small/medium/big data besides maybe Presto which is JVM based. I also believe that once we see the right tool every language will implement a version and go is a sleeping giant with a huge and growing fan base.
Also, sometimes it's just fun to hack on interesting stuff 😄
from octosql.
Data vision and arrow is certainly cool and has momentum.
Octosql is nice and low ceremony alternative. I prefer octosql. Will try to make PR on things .
@cube2222 would be good to have roadmap and triage issues out to agreed bits of work that people can then work on ??
from octosql.
Related Issues (20)
- Option to skip a line with warning instead of error
- Add REPL HOT 1
- Add Statistical Window Functions HOT 1
- queries using filenames that start with non-alpha characters fail HOT 4
- CSV parameter to secify separator
- Support HAVING of GROUP BY HOT 1
- Offset clause in csv query is ignored HOT 1
- File line by line HOT 4
- result of StreamJoin or OuterJoin is not equal with database HOT 4
- Four arithmetic operations between aggregation functions are not supported HOT 12
- GroupBy dimension not work, need replace with distinct HOT 3
- unsupported expression CaseExpr HOT 2
- Add charindex(String, String) function HOT 3
- Add substring_index(String, String, Int) function HOT 2
- Panic in parquet query
- Can this project be used as a Go dependency library?
- Decoding Error when Reading Parquet File - "RLE: Decoded Run-Length Block"
- Add Int64 type
- 't' and 'f' strings in csv are considered as boolean
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from octosql.