Giter VIP home page Giter VIP logo

Comments (4)

jkkummerfeld avatar jkkummerfeld commented on June 14, 2024
  1. "which ..." vs. "are there..." - this was something we wrestled with in the dataset creation. In the end we settled on the idea that if someone asked us "are there any classes...", we would not simply answer 'yes', we would say, 'yes, there is ...", making these equivalent. The argument could definitely be made the other way though. The data should be consistent on this interpretation.

  2. Dangling AND - good catch, we did test all of the queries at some point, so either we broke this after that :( or SQL didn't complain. I'll make a fix.

  3. sql-only - these can be thought of as default values. The long-term vision was that we would have profiles associated with questions ("this is a question from a 1st year student in 2018") that give context that is necessary for correct SQL generation, but we didn't get to it.

  4. 2016 vs 2017 - Hm, I thought we had caught this. The intention was to have the date set at a fixed point in time, with everything being consistent relative to that. I'll add this to the list of known issues and try to get to it. I'll leave this issue open too in the meantime.

from text2sql-data.

DeNeutoy avatar DeNeutoy commented on June 14, 2024

Thanks for the comprehensive answer (and sorry for all these issues i'm raising!).

from text2sql-data.

jkkummerfeld avatar jkkummerfeld commented on June 14, 2024

Quite the contrary - thank you for bringing things to our attention!

One of my hopes for this dataset is that it is not static the way many in NLP are. I suspect many other people came across some of the same bugs we saw in GeoQuery, ATIS, etc, but fixing corpus bugs is not a standard part of the academic process, so they didn't get fixed, which is a shame.

from text2sql-data.

jkkummerfeld avatar jkkummerfeld commented on June 14, 2024

I've now fixed the dangling AND and looked into the 2016 v 2017 question. All the 2017 cases are when the query asks about "next Winter", which means Winter 2017. It's not immediately clear that this is the case because 'Winter' is listed as a variable in the questions (so it shows up as "next semester0").

from text2sql-data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.