Giter VIP home page Giter VIP logo

Comments (5)

antonmks avatar antonmks commented on July 28, 2024

Hi Alexey !
So far there are no NULLs in Alenka. I agree that it would be a good idea to have NULLs in database and that storing NULLs as maximum values would be the best way to implement it. However it would complicate operations on numbers - for arithmetic operations we would have to check for NULL values before adding or multiplying vectors.
It also would be necessary to implement functions (like NVL in Oracle ) to convert NULLS to some values.
So probably it is not critical to implement NULLS right now - for now NULL values become 0's and empty strings.
I would rather focus now on improving performance and may be getting some real projects to use Alenka.
Btw, the exception that you got from loading files is a bug, so I fixed it.

Best regards,

Anton

from alenka.

AlexeyAB avatar AlexeyAB commented on July 28, 2024

Hi Anton!
Big thanks for your fix!
Yes, I am now testing on a small part of the real data from the customer's project and faced with such difficulties.
For testing right now on a real project hampered by the absence of the free GPU at the customer at this moment. But I can download a small part of data, that will skipped the automatic security system to test the project on my computer.

You are right, that it would complicate operations on numbers. But when joining tables the maximum value will be much less likely to coincide with the real values, ​​than it will happen with 0. And in this case we can't solve this problem.

Please, could you implement it, an example through #define - by switching between 0 and a maximum values, for integers and decimals? And NULL strings also always remain empty - all chars are 0.

In this case, for arithmetic operations on numbers, the maximum value converts to zero through easily implementing functions Oracle decode/nvl:

SQL: select sum(nvl(field, 0)) from table;

C++: nvl(field, 0);

template <typename T>
__host__ __device__ inline T decode(T &field, T const& compare_val, T const& then_val, T const& else_val)
{
    return (field == compare_val)? then_val: else_val;
}

template<typename T>
__host__ __device__ inline T nvl(T &field, T const& then_val)
{
    decode(field, NULL_VAL, then_val, field);
}

from alenka.

antonmks avatar antonmks commented on July 28, 2024

Hi Alexey !
You are right about joining on NULL values - they can conflict with existing valid 0 values.
I will have to modify a bit the compression routines which check min and max values of vectors and also the data load procedure and arithmetic calculations as well. I suppose that you still want NULLS to be converted to 0's in calculations ? That is, 1 + 3 + NULL = 4 ? Or the Oracle way, 1 + 3 + NULL = NULL ?

Anton

from alenka.

AlexeyAB avatar AlexeyAB commented on July 28, 2024

Hi Anton!
If this is not difficult, it is better as in Oracle: 1 + 3 + NULL = NULL

  1. In expression behaves as maximum value, in summation behaves as overflow and saves its maximum value, i.e. 1 + 3 + NULL = NULL.
  2. In the aggregate functions behaves as 0. An example sum(field) equal to: sum(nvl(field, 0)). And count(field) equal to: sum(decode(field, NULL, 0, 1)).
  3. In JOIN behaves as maximum value. But NULL must not match in join. (An example, if using sort-merge-join and ordered-group-by, then before this, sorted vectors must reduce size to exclude maximum values.)

Behavior as in Oracle in the examples:

select 2 + NULL from dual;

// result: NULL

with t1 as (
    select NULL as f1, 'a' as f2 from dual 
    union all
    select 2 as f1, 'b' as f2 from dual
)
select sum(f1) from t1;

// result: 2

with t1 as (
    select NULL as f1, 'a' as f2 from dual 
    union all
    select 2 as f1, 'b' as f2 from dual
 ), t2 as (
     select NULL as f1, 'a' as f2 from dual 
    union all
    select 2 as f1, 'b' as f2 from dual
)
select * from t1 join t2 on t1.f1 = t2.f1;

// result: 2 b 2 b

Thanks for your future fixes!

from alenka.

andrebaaij avatar andrebaaij commented on July 28, 2024

Hi,

I came across the discussion, and I love to see how this project will turn out. AlexeyAB correctly explains how nulls should be handled. I just wanted to add that wikipedia has a great article on how nulls are handled: http://en.wikipedia.org/wiki/Null_(SQL)#Arithmetic_operations it references to the ISO/IEC (2003). ISO/IEC 9075-2:2003.

from alenka.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.