Comments (9)
Are you using python 2? On my Mac I found that it worked with Python 3, but not with Python 2.
from text2sql-data.
HI Jonathan,
I am using python 3.7.6 on windows machine.
Thanks
Anshu
from text2sql-data.
HI, I was able to generate split on advising and atis dataset. But i am getting this error for wikisql and spider. Also, the train set gets generated as i can see for what line in json it actually failed.
for spider.json file, it failes after :
convert_instance --- {'query-split': 'N/A', 'sentences': [{'database': 'world_1', 'original': 'What is the total population and average area of countries in the continent of North America whose area is bigger than 3000?', 'question-split': 'dev', 'text': 'What is the total population and average area of countries in the continent of var0 whose area is bigger than var1?', 'variables': {'var0': 'North America', 'var1': '3000'}}, {'database': 'world_1', 'original': 'Give the total population and average surface area corresponding to countries in Noth America that have a surface area greater than 3000.', 'question-split': 'dev', 'text': 'Give the total population and average surface area corresponding to countries in Noth America that have a surface area greater than var1 .', 'variables': {'var1': '3000'}}], 'sql': ['SELECT AVG( SURFACEAREA ) , SUM( COUNTRYalias0.POPULATION ) FROM COUNTRY AS COUNTRYalias0 WHERE COUNTRYalias0.CONTINENT = "var0" AND SURFACEAREA > var1 ;'], 'sql-original': ['SELECT sum(Population) , avg(SurfaceArea) FROM country WHERE Continent = "North America" AND SurfaceArea > 3000'], 'variables': [{'example': 'North America', 'location': 'both', 'name': 'var0', 'type': 'unknown'}, {'example': '3000', 'location': 'both', 'name': 'var1', 'type': 'unknown'}]}
So i think the error is not related to python(3.7).
Thanks you so much :)
Best Regards
Anshu
from text2sql-data.
It's definitely a unicode compatibility issue. I've just pushed an update that replaces all unicode characters in Spider with their ascii equivalents.
For WikiSQL the problem is trickier. There are a lot of unicode characters in there that can't be easily replaced without losing information (e.g. diacritical marks). I would suggest narrowing it down to one example (as you did above) then doing some searching about the specific unicode character causing problems.
from text2sql-data.
HI, Thank you so much.
I tried with the new spider.json file to generate splits for query-split, but now only train split is getting generated and rest of them are empty.
Best
Anshu
from text2sql-data.
Hm, I'm not seeing that behaviour. I get both train and dev. Here is exactly what I ran and what I got:
~/Downloads/text2sql-data (master)$ echo data/spider.json | python ./tools/json_to_flat.py spider
~/Downloads/text2sql-data (master)$ ls -l | grep 'spider'
-rw-r--r-- 1 jkk staff 262067 Mar 31 10:54 spider.dev
-rw-r--r-- 1 jkk staff 0 Mar 31 10:54 spider.test
-rw-r--r-- 1 jkk staff 2386193 Mar 31 10:54 spider.train
Note that our data does not contain the Spider test set, that is kept private by the Yale group that developed it.
from text2sql-data.
HI Jonathon
Thank you so much!
I got the files converted from a Mac machine, as fixing the error in windows was taking a lot of time.
For spider there is no dev set for query split, is it correct?
And for wikisql query split, there is only 1 training split and no dev and test splits for it.
Can you please confirm both the details is the below screenshot is correct(just to know that i got the right files)?
Thanks and Best Regards
Anshu
from text2sql-data.
I just applied the process on my machine and got the same file sizes.
For WikiSQL, that's correct. Our query split definition doesn't apply effectively to it (though more general forms of templating could). For Spider, we did not use it in our paper (it was published later), so we did not define a query split.
Sounds like this is resolved, so I'm going to close this issue. Good luck with your work!
from text2sql-data.
Thanks :)
from text2sql-data.
Related Issues (20)
- Anonymised Variables should have consistent naming corresponding to their column HOT 4
- American vs UK spelling - Yelp Dataset HOT 1
- Hyperparameters for small datsets HOT 1
- Any canonicalization for quotes? HOT 2
- Correct way of handling the data split HOT 3
- Run sequence-to-sequence baseline models HOT 3
- SQL Patterns of WikiSQL HOT 1
- Allen NLP Text2SQL HOT 1
- Beam Search for Attention Copying seq2seq HOT 1
- Spacing error(?) in geography.json HOT 4
- Naming error (?) in restaurants.json HOT 1
- This is more of a question than issue. HOT 2
- Sqlite3 could not execute *-db.sql HOT 7
- wrong primary keys in GeoQuery schema HOT 7
- location.restaurant_id should be foreign key in "Restaurants"? HOT 3
- The number of data does not match that in the paper HOT 3
- Wrong output of Geography using json_to_flat.py HOT 3
- Downloading the database HOT 11
- data in spider format HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text2sql-data.