Giter VIP home page Giter VIP logo

Comments (3)

notactuallytreyanastasio avatar notactuallytreyanastasio commented on September 10, 2024

I believe the best call here considering we have real users outside us internally is the following:

  1. Add an application level config to return the 'truth' -- which in this case is a fixed-width list of tuples that are headers and rows, and allow the user to choose their own journey

  2. Get feedback from people whom may not have encountered this issue, but think of their concerns outside ours. Because if you are using this library to query at any reasonable level in your application, this changes a huge amount of flows and expected data structures. There is no way to map against it super reliably and not have to consider a world where you cant derive the 'truth' as we are able to with our specific querying methodology here. Because while we have a reasonable workaround, it is completely unreasonable to assume that a user of this library would be OK with a whole field of their SQL query just not being returned and having whatever the latter value was overriding the first because of a name collision and a data structure not being a perfect fit.

I dont really have strong feelings here, but there is something that feels off about it working as it sits right now, and I'd just like to gather others' thoughts and let that stew for a while and hopefully make a good decision when someone has time to seriously consider this.

It seems like maybe the best short-to-mid-term decision is to keep it as maps, but CLEARLY document the shortcomings and highlight the config. But speaking on the scale of years of use, I can only imagine this creating more footguns that lead one to thinking 'do I have a bug' but really at the core its a problem in library code.

from snowflex.

hexchung avatar hexchung commented on September 10, 2024

If I'm understanding this issue correctly, I believe I am experiencing a similar issue.

I have a query like:

SELECT table1.requested_at,
       CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', table1.requested_at) AS requested_at
       requested_at
  FROM table1

where the results look like

----------------------------------------------------------------------
|         requested_at |       requested_at_2 |       requested_at_3 |
----------------------------------------------------------------------
| 2022-06-09T22:41:32Z | 2022-06-09T18:41:32Z | 2022-06-09T22:41:32Z |

Notice the reference to requested_at with no identifier is overridden by the original table value. The solution I am going with is to just change the AS name to something like requested_at_tz to differentiate.

from snowflex.

notactuallytreyanastasio avatar notactuallytreyanastasio commented on September 10, 2024

@hexchung yes this is definitely the same problem.

The core of this will be quite a doozy to fix, as we are essentially choosing the wrong data structure to do this (maps).

Here is a canned example that shows it:

Query:

SELECT
  upc.foo AS foo
  upc.bar AS bar
  upc.bizzle AS foo
FROM
  MY_NAMESPACE.whatever

We will get this in the snowflake UI

| foo | bar | foo_2|
+-------------------+
|  1  |  2  |  0   |

Which, when we do the reduction we see here will ultimately overwrite the original value of foo with 0 because while the UI shows it as foo_2 it actually is returned with the original foo key.

The solution here will potentially be to make an application-level configuration for the library which allows us to return lists of tuples, which can have this duplication, but maps as a data structure inherently will require people to work around this in ways that are similar to what you are dealing with.

Ultimately, when doing joins or selects that will result in like-column-names and you only want one, specifically noting that by being explicit with all columns and want you want vs *, or doing a different AS casting will be the best answer to get past these issues.

I am not sure when we will have bandwidth to make this change, cut a new release, etc on the PepsiCo side of things, but all pull requests are welcome. The impact on performance and massive API change in data structure makes it so that it seems we definitely cant just default back to it, but making it a configurable option seems reasonable IMO.

I have also discussed this with @jeremyowensboggs -- feel free to chime in if you have any further thoughts about this per our previous Slack discussions, Jeremy.

from snowflex.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.