Giter VIP home page Giter VIP logo

[BUG] [GPU Error Bug] "SELECT -2613 FROM <table> HAVING (<TIMESTAMP> NOT BETWEEN <TIMESTAMP> AND MAX(<TIMESTAMP>))" brings Error about dask-sql HOT 2 OPEN

qwebug avatar qwebug commented on June 10, 2024
[BUG] [GPU Error Bug] "SELECT -2613 FROM HAVING ( NOT BETWEEN AND MAX())" brings Error

from dask-sql.

Comments (2)

charlesbluca avatar charlesbluca commented on June 10, 2024

Thanks for filing @qwebug! In the past few months I haven't had as much capacity to be active on the issue tracker here so apologize in advance if many of the issues you've filed don't addressed right away, though we always invite external contributors if you have any interest in digging into this 😉 from your example, it's a little difficult to tell what in particular is causing the bug, but it does look like we seem to be passing an object that isn't supported into cuDF's datetime column mechanics.

I'd recommend trying to trim your example down a bit so it's more immediately obvious what the root cause here is. For example, I notice that the table in your example contains a SQL query - is this relevant to the failure you encountered? If not, it might make sense to use more trivial data here, i.e. ['a', 'b', 'c'] to quickly convey "this thing doesn't work on string data in general." It's also difficult to tell what part of the query causes things to break - do things work if we select a column instead of a scalar integer? Or if we choose a different type of scalar? Do things work if we include the MAX operation on one of the timestamps? I think if I were to rewrite your example, it'd probably look something like this (haven't tested any of this locally, purely an illustrative example):

import pandas as pd
from dask_sql import Context

c = Context()

df = pd.DataFrame({
    "a": list("abcde"),
})
c.create_table('df', df, gpu=True)

# this works!
res = c.sql("SELECT -2613 FROM df HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND TIMESTAMP '2006-08-05 07:29:26')").compute()
# this doesn't work!
res = c.sql("SELECT -2613 FROM df HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND MAX(TIMESTAMP '2006-08-05 07:29:26'))").compute()

Finally, I'm interested in if there's any additional context on how you encountered this issue (and the others you've filed)? Some of these queries seem like pretty carefully designed edge cases, which are great for unit testing even if they're sometimes difficult to find the bug in 😄

from dask-sql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.