Comments (4)
-
"which ..." vs. "are there..." - this was something we wrestled with in the dataset creation. In the end we settled on the idea that if someone asked us "are there any classes...", we would not simply answer 'yes', we would say, 'yes, there is ...", making these equivalent. The argument could definitely be made the other way though. The data should be consistent on this interpretation.
-
Dangling AND - good catch, we did test all of the queries at some point, so either we broke this after that :( or SQL didn't complain. I'll make a fix.
-
sql-only - these can be thought of as default values. The long-term vision was that we would have profiles associated with questions ("this is a question from a 1st year student in 2018") that give context that is necessary for correct SQL generation, but we didn't get to it.
-
2016 vs 2017 - Hm, I thought we had caught this. The intention was to have the date set at a fixed point in time, with everything being consistent relative to that. I'll add this to the list of known issues and try to get to it. I'll leave this issue open too in the meantime.
from text2sql-data.
Thanks for the comprehensive answer (and sorry for all these issues i'm raising!).
from text2sql-data.
Quite the contrary - thank you for bringing things to our attention!
One of my hopes for this dataset is that it is not static the way many in NLP are. I suspect many other people came across some of the same bugs we saw in GeoQuery, ATIS, etc, but fixing corpus bugs is not a standard part of the academic process, so they didn't get fixed, which is a shame.
from text2sql-data.
I've now fixed the dangling AND and looked into the 2016 v 2017 question. All the 2017 cases are when the query asks about "next Winter", which means Winter 2017. It's not immediately clear that this is the case because 'Winter' is listed as a variable in the questions (so it shows up as "next semester0").
from text2sql-data.
Related Issues (20)
- Anonymised Variables should have consistent naming corresponding to their column HOT 4
- American vs UK spelling - Yelp Dataset HOT 1
- Hyperparameters for small datsets HOT 1
- Any canonicalization for quotes? HOT 2
- Correct way of handling the data split HOT 3
- Run sequence-to-sequence baseline models HOT 3
- SQL Patterns of WikiSQL HOT 1
- Allen NLP Text2SQL HOT 1
- Beam Search for Attention Copying seq2seq HOT 1
- Spacing error(?) in geography.json HOT 4
- Naming error (?) in restaurants.json HOT 1
- Error for wikisql.json and spider.json while generating test,train and dev split HOT 9
- This is more of a question than issue. HOT 2
- Sqlite3 could not execute *-db.sql HOT 7
- wrong primary keys in GeoQuery schema HOT 7
- location.restaurant_id should be foreign key in "Restaurants"? HOT 3
- The number of data does not match that in the paper HOT 3
- Wrong output of Geography using json_to_flat.py HOT 3
- Downloading the database HOT 11
- data in spider format HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text2sql-data.