Giter VIP home page Giter VIP logo

Comments (7)

pedroerp avatar pedroerp commented on July 18, 2024

Is it common to find joins which aren't equi-joins in the workloads you are running?

I'm not sure we ran until this before, which I guess it's because most of the joins in our workloads are equi? Cc: @mbasmanova @Yuhta

from velox.

aditi-pandit avatar aditi-pandit commented on July 18, 2024

@pedroerp : Join planning for correlated queries often use non-equi joins. It is important for us to plan this efficiently.

I've heard in passing that most Meta queries use equi-joins. Though I remember Amit pushing for some NestedLoopJoin fixes in the past. So Meta workloads do have non-equi joins as well I think.

But the original NestedLoopJoin operators are non-Meta contribution AFAIK.

from velox.

Yuhta avatar Yuhta commented on July 18, 2024

I checked the LookupJoin in Presto a little bit and find it can do first hash join on some keys, then do range query on other keys of the matched rows to further filter down rows (like the join filter we have in velox hash join). Do we want to support this use case? Or we just want to have range query join as the main way of join?

from velox.

mbasmanova avatar mbasmanova commented on July 18, 2024

CC: @usurai

from velox.

aditi-pandit avatar aditi-pandit commented on July 18, 2024

@Yuhta : My use case if for range query join (without any equality condition). It becomes NestedLoopJoin today.

For hash join followed by range query we are fine with the HashJoin(Build/Probe) we have in Velox already.

from velox.

oerling avatar oerling commented on July 18, 2024

Presto has a join where it builds a tree and then does lookups in that. Usually, databases make durable indices and then maintain these. The Presto range lookup is not like that, it builds on demand and just like a hash join. Thre is I recall some class called IndexBuilder or such. Should there be that i Velox? How would thuis spill? Would this be like an index in a traditional database with a buffer pool and all? How would this be prtitioned? Or broadcast? If it were partitioned, it would have to have a broadcast on the probe side or a range partition. The latter would have to be adaptive, since we are talking about a query time artifact, not an index tat somebody creates and is then maintained by the system.

Imagine a minimal implementation: Build it like a hash join but make a B tree over it instead of a hash table. If there is a leading equality, the build can be partitioned on that. Otherwise there is a broadcast from build if the build is small and a broadcast on probe if it is large. Then there is a representation of a range condition.

We have found no use for this at Meta. But somebody could build thuis of course. I an specify how this is done if somebody wants to write this.

from velox.

aditi-pandit avatar aditi-pandit commented on July 18, 2024

@oerling : Agree with the idea of exploring a query time artifact instead of index in traditional database with a buffer pool.

Your suggestion to build it like a hash join but make it a B-tree with links to follow for a range search is a good starting point.

There isn't a pressing need from our side for this rightaway. But I'm intrigued to hear more about your ideas for this.

We might find interested folks to write this.

from velox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.