Giter VIP home page Giter VIP logo

Comments (9)

jkkummerfeld avatar jkkummerfeld commented on June 6, 2024

Are you using python 2? On my Mac I found that it worked with Python 3, but not with Python 2.

from text2sql-data.

anshudaur avatar anshudaur commented on June 6, 2024

HI Jonathan,
I am using python 3.7.6 on windows machine.
Thanks
Anshu

from text2sql-data.

anshudaur avatar anshudaur commented on June 6, 2024

HI, I was able to generate split on advising and atis dataset. But i am getting this error for wikisql and spider. Also, the train set gets generated as i can see for what line in json it actually failed.
for spider.json file, it failes after :
convert_instance --- {'query-split': 'N/A', 'sentences': [{'database': 'world_1', 'original': 'What is the total population and average area of countries in the continent of North America whose area is bigger than 3000?', 'question-split': 'dev', 'text': 'What is the total population and average area of countries in the continent of var0 whose area is bigger than var1?', 'variables': {'var0': 'North America', 'var1': '3000'}}, {'database': 'world_1', 'original': 'Give the total population and average surface area corresponding to countries in Noth America that have a surface area greater than 3000.', 'question-split': 'dev', 'text': 'Give the total population and average surface area corresponding to countries in Noth America that have a surface area greater than var1 .', 'variables': {'var1': '3000'}}], 'sql': ['SELECT AVG( SURFACEAREA ) , SUM( COUNTRYalias0.POPULATION ) FROM COUNTRY AS COUNTRYalias0 WHERE COUNTRYalias0.CONTINENT = "var0" AND SURFACEAREA > var1 ;'], 'sql-original': ['SELECT sum(Population) , avg(SurfaceArea) FROM country WHERE Continent = "North America" AND SurfaceArea > 3000'], 'variables': [{'example': 'North America', 'location': 'both', 'name': 'var0', 'type': 'unknown'}, {'example': '3000', 'location': 'both', 'name': 'var1', 'type': 'unknown'}]}

So i think the error is not related to python(3.7).
Thanks you so much :)
Best Regards
Anshu

from text2sql-data.

jkkummerfeld avatar jkkummerfeld commented on June 6, 2024

It's definitely a unicode compatibility issue. I've just pushed an update that replaces all unicode characters in Spider with their ascii equivalents.

For WikiSQL the problem is trickier. There are a lot of unicode characters in there that can't be easily replaced without losing information (e.g. diacritical marks). I would suggest narrowing it down to one example (as you did above) then doing some searching about the specific unicode character causing problems.

from text2sql-data.

anshudaur avatar anshudaur commented on June 6, 2024

HI, Thank you so much.
I tried with the new spider.json file to generate splits for query-split, but now only train split is getting generated and rest of them are empty.

Best
Anshu

from text2sql-data.

jkkummerfeld avatar jkkummerfeld commented on June 6, 2024

Hm, I'm not seeing that behaviour. I get both train and dev. Here is exactly what I ran and what I got:

~/Downloads/text2sql-data (master)$ echo data/spider.json | python ./tools/json_to_flat.py spider
~/Downloads/text2sql-data (master)$ ls -l | grep 'spider'
-rw-r--r--   1 jkk  staff   262067 Mar 31 10:54 spider.dev
-rw-r--r--   1 jkk  staff        0 Mar 31 10:54 spider.test
-rw-r--r--   1 jkk  staff  2386193 Mar 31 10:54 spider.train

Note that our data does not contain the Spider test set, that is kept private by the Yale group that developed it.

from text2sql-data.

anshudaur avatar anshudaur commented on June 6, 2024

HI Jonathon
Thank you so much!
I got the files converted from a Mac machine, as fixing the error in windows was taking a lot of time.
For spider there is no dev set for query split, is it correct?
And for wikisql query split, there is only 1 training split and no dev and test splits for it.
Can you please confirm both the details is the below screenshot is correct(just to know that i got the right files)?

image

Thanks and Best Regards
Anshu

from text2sql-data.

jkkummerfeld avatar jkkummerfeld commented on June 6, 2024

I just applied the process on my machine and got the same file sizes.

For WikiSQL, that's correct. Our query split definition doesn't apply effectively to it (though more general forms of templating could). For Spider, we did not use it in our paper (it was published later), so we did not define a query split.

Sounds like this is resolved, so I'm going to close this issue. Good luck with your work!

from text2sql-data.

anshudaur avatar anshudaur commented on June 6, 2024

Thanks :)

from text2sql-data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.