Hi, I am getting below mentioned unicode encode error while running

Error for wikisql.json and spider.json while generating test,train and dev split about text2sql-data HOT 9 CLOSED

jkkummerfeld commented on June 6, 2024

Error for wikisql.json and spider.json while generating test,train and dev split

from text2sql-data.

Comments (9)

jkkummerfeld commented on June 6, 2024

Are you using python 2? On my Mac I found that it worked with Python 3, but not with Python 2.

from text2sql-data.

anshudaur commented on June 6, 2024

HI Jonathan,
I am using python 3.7.6 on windows machine.
Thanks
Anshu

from text2sql-data.

anshudaur commented on June 6, 2024

HI, I was able to generate split on advising and atis dataset. But i am getting this error for wikisql and spider. Also, the train set gets generated as i can see for what line in json it actually failed.
for spider.json file, it failes after :
convert_instance --- {'query-split': 'N/A', 'sentences': [{'database': 'world_1', 'original': 'What is the total population and average area of countries in the continent of North America whose area is bigger than 3000？', 'question-split': 'dev', 'text': 'What is the total population and average area of countries in the continent of var0 whose area is bigger than var1？', 'variables': {'var0': 'North America', 'var1': '3000'}}, {'database': 'world_1', 'original': 'Give the total population and average surface area corresponding to countries in Noth America that have a surface area greater than 3000.', 'question-split': 'dev', 'text': 'Give the total population and average surface area corresponding to countries in Noth America that have a surface area greater than var1 .', 'variables': {'var1': '3000'}}], 'sql': ['SELECT AVG( SURFACEAREA ) , SUM( COUNTRYalias0.POPULATION ) FROM COUNTRY AS COUNTRYalias0 WHERE COUNTRYalias0.CONTINENT = "var0" AND SURFACEAREA > var1 ;'], 'sql-original': ['SELECT sum(Population) , avg(SurfaceArea) FROM country WHERE Continent = "North America" AND SurfaceArea > 3000'], 'variables': [{'example': 'North America', 'location': 'both', 'name': 'var0', 'type': 'unknown'}, {'example': '3000', 'location': 'both', 'name': 'var1', 'type': 'unknown'}]}

So i think the error is not related to python(3.7).
Thanks you so much :)
Best Regards
Anshu

from text2sql-data.

jkkummerfeld commented on June 6, 2024

It's definitely a unicode compatibility issue. I've just pushed an update that replaces all unicode characters in Spider with their ascii equivalents.

For WikiSQL the problem is trickier. There are a lot of unicode characters in there that can't be easily replaced without losing information (e.g. diacritical marks). I would suggest narrowing it down to one example (as you did above) then doing some searching about the specific unicode character causing problems.

from text2sql-data.

anshudaur commented on June 6, 2024

HI, Thank you so much.
I tried with the new spider.json file to generate splits for query-split, but now only train split is getting generated and rest of them are empty.

Best
Anshu

from text2sql-data.

jkkummerfeld commented on June 6, 2024

Hm, I'm not seeing that behaviour. I get both train and dev. Here is exactly what I ran and what I got:

~/Downloads/text2sql-data (master)$ echo data/spider.json | python ./tools/json_to_flat.py spider
~/Downloads/text2sql-data (master)$ ls -l | grep 'spider'
-rw-r--r--   1 jkk  staff   262067 Mar 31 10:54 spider.dev
-rw-r--r--   1 jkk  staff        0 Mar 31 10:54 spider.test
-rw-r--r--   1 jkk  staff  2386193 Mar 31 10:54 spider.train

Note that our data does not contain the Spider test set, that is kept private by the Yale group that developed it.

from text2sql-data.

anshudaur commented on June 6, 2024

HI Jonathon
Thank you so much!
I got the files converted from a Mac machine, as fixing the error in windows was taking a lot of time.
For spider there is no dev set for query split, is it correct?
And for wikisql query split, there is only 1 training split and no dev and test splits for it.
Can you please confirm both the details is the below screenshot is correct(just to know that i got the right files)?

Thanks and Best Regards
Anshu

from text2sql-data.

jkkummerfeld commented on June 6, 2024

I just applied the process on my machine and got the same file sizes.

For WikiSQL, that's correct. Our query split definition doesn't apply effectively to it (though more general forms of templating could). For Spider, we did not use it in our paper (it was published later), so we did not define a query split.

Sounds like this is resolved, so I'm going to close this issue. Good luck with your work!

from text2sql-data.

anshudaur commented on June 6, 2024

Thanks :)

from text2sql-data.

Error for wikisql.json and spider.json while generating test,train and dev split about text2sql-data HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent