Giter VIP home page Giter VIP logo

Comments (8)

mbasmanova avatar mbasmanova commented on August 17, 2024 1

CC: @wypb

from velox.

mbasmanova avatar mbasmanova commented on August 17, 2024 1

CC: @pedroerp

from velox.

Yuhta avatar Yuhta commented on August 17, 2024 1

We will probably need a virtual function on logical type to do the comparison. The hard part is how do we avoid calling that virtual function for common logical types to avoid performance regression.

from velox.

mbasmanova avatar mbasmanova commented on August 17, 2024 1

How does Presto Java does it?

Presto defines a set of operators (add, subtract, etc.) and each type is expected to provide an implementation for a subset of these that are supported.

See

from velox.

Yuhta avatar Yuhta commented on August 17, 2024 1

I see annotations in Java code so probably some codegen magic is happening. The equivalent in Velox would be template magic.

from velox.

pedroerp avatar pedroerp commented on August 17, 2024

Good catch. I suppose we need to provide a plugable API for user to specify equality and comparison functions for custom logical types, sort of like how this is done in C++ (operator==, ...).

Is there anything else that should be expose? I guess at least equality and some form of comparison for sorting?

How does Presto Java does it? Or they just have all types hard coded throughout the codebase?

from velox.

pedroerp avatar pedroerp commented on August 17, 2024

I see. That would probably mean each row comparison would incur in a virtual function call? Would be nice if we could come up with a batch/vector oriented API to amortize the cost.

from velox.

oerling avatar oerling commented on August 17, 2024

Specifying Comparison of Extended Types

Extended types, like timestampp with timezone must have special comparison and hashing for hash tables and special comparison in expressions.

This can be implemented by adding virtual functions to Type. These are not defined if type->isExtendedType() is false and are defined otherwise.

The signatures are:

int32_t compare(const BaseVector& left, vector_size_t leftIndex, const BaseVector& right, vector_size_t rightIndex) const;

int32_t compare(const DecodedVector& left, vector_size_t index, void* right) const;

The first compares single elements of vectors. The second compares a DecodedVector to a slot in a RowContainer.

The return value is < 0 for lt, 0 for equals and > 0 for gt.

uint64_t hash(const BaseVector& vector, vector_size_t index) const;

The call sites are

  • VectorHasher: An extended type forces use of kHash. So only the hash, not the value id methods know about extended types.

  • Vectors

BBaseVector::equalValueAt and compare need to call the Type virtual function in the case of the vector being of an extended tyope. The type's extendedness should be cached in BaseVector to similarly to the kind, so that the type does not have to be accessed.

  • HashTable and RowContainer:

HashTable in kHash mode switches on the TypeKind. While there is no TypeKind for extended type, this switch can switch on an extended TypeKind enum that has a value for extended type that goes to Type::compare. This enum (int) is internal to HashTable.

The same logic occurs in spilling, which compares vectors with BaseVector::compare.

OrderBy

This will probably work just by BaseVector supporting the types.

Functions

The vector functions for comparison need a case for extended types. Type could have a vectorized comparison, e.g. compareMultiple(const DecodedVector& left, const DecodedVector& right, const SelectivityVector& rrows, int32_t* result). This is only needed if performance is an issue.

from velox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.