beachwang / dail-sql Goto Github PK
View Code? Open in Web Editor NEWA efficient and effective few-shot NL2SQL method on GPT-4.
License: Apache License 2.0
A efficient and effective few-shot NL2SQL method on GPT-4.
License: Apache License 2.0
Hello,
I just read your work and find that you add foreign keys to BS_p
, OD_p
, TR_p
, and AS_p
prompt formats. I'm curious about these formats with foreign keys, but I can't find details of these formats in your paper. Did I missed something or could you just give some examples about these formats?
Thank you.
Hello!
Thanks for great work! when I followed the instruction I got stuck when I ran the data_preprocess.py, it told me that 'Error while loading a tagger model (probably missing model file). Why did this situation occur? How to resolve this problem?
Thanks a lot!
Thanks for sharing the DAIl-SQL codes. I would like to ask when the question is in Chinese, even if you create the Chinese-English mapping table in advance, there will be problems with table positioning and column selection. Do you have any suggestions for this problem?
I have read your paper and replicated your experiment, but there's something I don't quite understand. Perhaps I missed it, but could you please point out to me how the preliminary model graphix is introduced in your code? I would greatly appreciate it!
Hello, I just read your paper and am taking a look at the code. Thank you for your contributions.
There was one element in the paper that I did not fully understand which I hope you can assist me with.
Where does the pre-predicted s' come from? Does this arrive from a separate, preliminary model, and is this model within the codebase?
python generate_question.py \
--data_type spider \
--split test \
--tokenizer gpt-3.5-turbo \
--max_seq_len 4096 \
--selector_type EUCDISMASKPRESKLSIMTHR \
--pre_test_result [your_pre_generated_queries_file] \
--prompt_repr SQL \
--k_shot 9 \
--example_type QA
I saw that in an example run setup for considering both question similarity and query similarity, the "pre_test_result" flag was included. I am unsure where these pre generated queries arrive from, and would greatly appreciate an explanation. Thank you.
Can't download
Can you provide code for calculating the EM accuracy and EX accuracy of prediction results?
Can you provide the code for evaluation?
[UPDATE] Nevermind, your_pre_generated_queries_file is already included at ./results/DAIL-SQL+GPT-4.txt
I hope all is well, I'm attempting to re-implement this research paper, and I was wondering if there is any chance that you have a pre_generated_queries file that I could use to do some testing of the functionality. If not, no worries. Thanks and the paper is awesome, great work!
python generate_question.py
--data_type spider
--split test
--tokenizer gpt-3.5-turbo
--max_seq_len 4096
--selector_type EUCDISMASKPRESKLSIMTHR
--pre_test_result [your_pre_generated_queries_file]
--prompt_repr SQL
--k_shot 9
--example_type QA
Hi,
Like BugMaker-Boyan, I'm super grateful that you have added the BIRD-SQL results -- it's definitely the best benchmark in Text-to-SQL right now.
I'm strapped for compute at the moment (aren't we all), and don't think I can afford to run the pre_test_result generation phase.
I want to check that the url below, contains the pre_test_results you used to Bird-SQL.
If they are not the correct pre_test_results, I was wondering if you could release the pre_test_results you originally generated, similar to the results/graphix_result.txt for Spider dev?
I'm excited to see what you and your team do next!
Hello, I just read your paper and am looking at the code. I appreciate your efforts for this new method. I want to use this method on my database and I am not getting it after reading this how can I do this or it is just for spider data? I would be grateful if you could guide us on how can we use on own data.
Thanks !!
Hello, first of all I would like to congratulate you for the great work! I have been focusing on the code but I am having difficulties in understanding the process and steps in order to generate the jsonl file of spider data of test_schema_linking and train_schema_linking, could you help me in this ?
hi,thanks for sharing DAIl-SQL codes.
We notice the improvements by DAIL demo selection(EuclideanDistanceQuestionMaskPreSkeletonSimilarThresholdSelector)
Here is my question:
When evaluating at Spider-Test, did you add Spider-dev into the demo sets?
Hi!
In the scripts provided the schema linking is only for the dev and train_spider sets, which have the same databases. To get the results reported in the paper for the test set, which has a different set of tables, how do I do the schema linking? Do I include every table in the database folder and in the tables file (having training and dev), or this will lower the results? Or do I alter the script in preprocess to do a separate schema link for the test?
Thanks
Anton
Hello, I saw that in your code, "train_spider_and_others.json" and "train_others.json" were merged to get train_spider_and_others. Would you like to ask if you used all the spider data sets in your paper, or just the dev.json data set?
I see that the default data in generate_question.py uses dev.json
Did you train the model using train_spider_and_others and then test it using dev.json?
Hi,
I would appreciate it if you could provide me with the evaluation script since there is no information about it in the repository. Although I am aware that you use some other packages for evaluation, I am not sure how you utilized them to assess the results.
The error comes from here:
if jaccard_similarity(train_json[index]["pre_skeleton"], target["pre_skeleton"]) < self.threshold:
the details as below:
File "/home/jiangshan/code/DAIL-SQL/prompt/ExampleSelectorTemplate.py", line 353, in get_examples
if jaccard_similarity(train_json[index]["pre_skeleton"], target["pre_skeleton"]) < self.threshold:
File "/home/jiangshan/code/DAIL-SQL/prompt/PromptICLTemplate.py", line 51, in format
examples = self.get_examples(target, self.NUM_EXAMPLE * scope_factor, cross_domain=cross_domain)
File "/home/jiangshan/code/DAIL-SQL/generate_question.py", line 87, in
question_format = prompt.format(target=question_json,
KeyError: 'pre_skeleton'
index = 2039
train_json[index]:
{'db_id': 'party_people', 'query': 'SELECT count() FROM region', 'query_toks': ['SELECT', 'count', '(', '', ')', 'FROM', 'region'], 'query_toks_no_value': ['select', 'count', '(', '*', ')', 'from', 'region'], 'question': 'How many regions do we have?', 'question_toks': ['How', 'many', 'regions', 'do', 'we', 'have', '?'], 'sql': {'from': {...}, 'select': [...], 'where': [...], 'groupBy': [...], 'having': [...], 'orderBy': [...], 'limit': None, 'intersect': None, 'union': None, 'except': None}, 'tables': [{...}, {...}, {...}, {...}], 'query_skeleton': 'select count ( _ ) from _', 'path_db': '/home/jiangshan/data/datasets/llm/spider/database/party_people/party_people.sqlite', 'sc_link': {'q_col_match': {...}, 'q_tab_match': {...}}, 'cv_link': {'num_date_match': {}, 'cell_match': {}}, 'question_for_copying': ['how', 'many', 'regions', 'do', 'we', 'have', '?'], 'column_to_table': {'0': None, '1': 0, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 1, '8': 1, '9': 1, '10': 1, '11': 1, '12': 1, '13': 2, '14': 2, '15': 2, '16': 2, '17': 3, '18': 3, ...}, 'table_names_original': ['region', 'party', 'member', 'party_events'], 'question_pattern': 'how many _ do we have ?', 'pre_skeleton': 'select count ( _ ) from _'}
The target is :
{'db_id': 'concert_singer', 'query': 'SELECT count() FROM singer', 'query_toks': ['SELECT', 'count', '(', '', ')', 'FROM', 'singer'], 'query_toks_no_value': ['select', 'count', '(', '*', ')', 'from', 'singer'], 'question': 'How many singers do we have?', 'question_toks': ['How', 'many', 'singers', 'do', 'we', 'have', '?'], 'sql': {'from': {...}, 'select': [...], 'where': [...], 'groupBy': [...], 'having': [...], 'orderBy': [...], 'limit': None, 'intersect': None, 'union': None, 'except': None}, 'tables': [{...}, {...}, {...}, {...}], 'query_skeleton': 'select count ( _ ) from _', 'path_db': '/home/jiangshan/data/datasets/llm/spider/database/concert_singer/concert_singer.sqlite', 'sc_link': {'q_col_match': {...}, 'q_tab_match': {...}}, 'cv_link': {'num_date_match': {}, 'cell_match': {}}, 'question_for_copying': ['how', 'many', 'singers', 'do', 'we', 'have', '?'], 'column_to_table': {'0': None, '1': 0, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 0, '8': 1, '9': 1, '10': 1, '11': 1, '12': 1, '13': 1, '14': 1, '15': 2, '16': 2, '17': 2, '18': 2, ...}, 'table_names_original': ['stadium', 'singer', 'concert', 'singer_in_concert'], 'question_pattern': 'how many _ do we have ?'}
The debug args are:
"args": [
"--data_type", "spider",
"--split", "test",
"--tokenizer", "/home/schinta/data/model/llm/pre_train/THUDM/chatglm3-6b",
"--max_seq_len", "4096",
"--selector_type", "EUCDISMASKPRESKLSIMTHR",
"--prompt_repr", "SQL",
"--k_shot", "9",
"--example_type", "QA"
]
The pip list show as below:
Package Version
accelerate 0.28.0
aiofiles 23.2.1
aiohttp 3.8.4
aiosignal 1.3.1
altair 5.2.0
annotated-types 0.6.0
annoy 1.17.1
anyio 3.7.0
async-timeout 4.0.2
attrs 23.1.0
bpemb 0.3.5
certifi 2024.2.2
charset-normalizer 3.1.0
click 8.1.7
cmake 3.26.3
contourpy 1.1.1
corenlp-protobuf 3.8.0
cpm-kernels 1.0.11
cycler 0.12.1
dataclasses-json 0.5.7
distro 1.9.0
exceptiongroup 1.1.1
ffmpy 0.3.2
filelock 3.12.0
fonttools 4.50.0
frozenlist 1.3.3
fsspec 2023.5.0
gensim 4.3.2
greenlet 2.0.2
h11 0.14.0
httpcore 1.0.4
httpx 0.27.0
huggingface-hub 0.23.0
idna 3.7
importlib_resources 6.4.0
Jinja2 3.1.2
joblib 1.2.0
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
latex2mathml 3.77.0
lit 16.0.6
Markdown 3.6
markdown-it-py 3.0.0
MarkupSafe 2.1.2
marshmallow 3.19.0
marshmallow-enum 1.5.1
matplotlib 3.7.5
mdtex2html 1.3.0
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.4
mypy-extensions 1.0.0
nemoguardrails 0.3.0
networkx 3.1
nltk 3.8.1
numexpr 2.8.4
numpy 1.24.4
nvidia-cublas-cu11 11.10.3.66
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu11 8.5.0.96
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu11 10.9.0.58
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu11 10.2.10.91
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu11 11.7.4.91
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu11 2.14.3
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu11 11.7.91
nvidia-nvtx-cu12 12.1.105
openai 1.30.1
openapi-schema-pydantic 1.2.4
orjson 3.9.15
packaging 24.0
pandas 2.0.3
pillow 10.2.0
pip 24.0
pkgutil_resolve_name 1.3.10
protobuf 3.20.3
psutil 5.9.8
pydantic 2.7.1
pydantic_core 2.18.2
pydub 0.25.1
Pygments 2.17.2
pyparsing 3.1.2
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0
referencing 0.34.0
regex 2023.5.5
requests 2.31.0
rfc3986 1.5.0
rich 13.7.1
rpds-py 0.18.0
ruff 0.3.4
safetensors 0.3.1
scikit-learn 1.2.2
scipy 1.10.1
semantic-version 2.10.0
sentence-transformers 2.2.2
sentencepiece 0.1.99
setuptools 65.5.1
shellingham 1.5.4
simpleeval 0.9.13
six 1.16.0
smart-open 7.0.4
sniffio 1.3.0
sql_metadata 2.11.0
SQLAlchemy 2.0.17
sqlparse 0.5.0
stanford-corenlp 3.9.2
sympy 1.12
threadpoolctl 3.1.0
tokenizers 0.13.3
tomlkit 0.12.0
toolz 0.12.1
torch 2.3.0
torchtext 0.18.0
torchvision 0.18.0
tqdm 4.65.0
transformers 4.27.1
triton 2.3.0
typer 0.10.0
typing_extensions 4.11.0
typing-inspect 0.9.0
tzdata 2024.1
urllib3 2.2.1
uvicorn 0.22.0
websockets 11.0.3
wheel 0.43.0
wrapt 1.16.0
yarl 1.9.2
zipp 3.18.1
ChatGPT4 API calls are expensive, can you provide the pre-generated SQL query dataset which is generated by GPT-4 for EUCDISMASKPRESKLSIMTHR?
Thanks for your nice work.
Can you open source code & dev-predicted-sqls for re-implement BIRD dataset? I cannot find related code in this repo.
Thanks.
Line 112 in d78ce34
Need to add data_type = args.data_type
in the previous line to avoid the following error:
Traceback (most recent call last):
File "data_preprocess.py", line 112, in
if data_type == "spider":
NameError: name 'data_type' is not defined
I am very interested in your articles and code. When I read the code, I found that you use sentence-transformers to calculate Euclidean distance to extract examples. At the same time, I also noticed that you have extracted the question skeleton and SQL skeleton and they were used when generating question.json.
def __init__(self, tokenizer: str, *args, **kwargs):
self.tokenizer = get_tokenizer(tokenizer)
self.example_qualities = []
self.pattern_similarities = []
def record_example_quality(self, examples, target):
quality_list = []
for example in examples:
quality_list.append(jaccard_similarity(example["query_skeleton"], target["query_skeleton"]))
self.example_qualities.append(quality_list)
def get_example_quality(self):
if self.example_qualities:
return np.mean([num for row in self.example_qualities for num in row])
else:
return 1
def get_example_quality_for_each(self):
if self.example_qualities:
return [np.mean(row) for row in self.example_qualities]
else:
return []
def record_pattern_similarity(self, examples, target):
similarity_list = []
for example in examples:
similarity_list.append(jaccard_similarity(example["question_pattern"], target["question_pattern"]))
self.pattern_similarities.append(similarity_list)
def get_pattern_similarity(self):
if self.pattern_similarities:
return np.mean([num for row in self.pattern_similarities for num in row])
else:
return 1
However, to my surprise, you only used the skeleton to calculate the average for output and did not use them in ask_llm.py
.
So I'm curious about what is the purpose of extracting skeletons?
I am very interested in your work, could you please help me clarify my doubts? Thank you.
Dear All,
Would it be possible to share zero-shot results for bird dataset?
Thank you!
I eval the spider-dev using official test-suit-sql-eval scripts, with the GPT-4 results in your repo, but the execution accuracy is different.
The paper result: 83.5%
But I get: 76.2
run cmd:
python3 evaluation.py --gold ./my_test/gold_sqls.txt --pred ./my_test/DAIL-SQL+GPT-4+self-consistency.txt --db ./database/database/ --etype exec
easy medium hard extra all
count 248 446 174 166 1034
===================== EXECUTION ACCURACY =====================
execution 0.903 0.818 0.661 0.506 0.762
Hi!
Thanks for the great work! I followed the instructions in the readme but got no lock while running python data_preprocess.py
It stucks at "test section linking: " at 0%. Any clues on what could be the possible reason?
Thanks a lot!
Hello, I would like to ask, in the text2sql experiment,
is your database schema bound with natural language (referring to the question based on the database schema),
or is the natural language question independent of the database schema (according to the natural language question to find the matching database schema into the corresponding prompt)?
I would like to understand the performance of DIAL-SQL on the bird dataset under GPT-3.5-turbo. May I ask if you have any results?
Hi there!
I was initially implementing DAIL-SQL, and the version I pulled at the beginning of December worked perfectly fine. However, with the new commits and the updated version, I'm encountering the following error when I run this command:
python generate_question.py \
-- data_type spider \
--split test \
--tokenizer gpt-3.5-turbo \
--max_seq_len 4096 \
--prompt_repr SQL \
--k_shot 9 \
--example_type QA \
--selector_type EUCDISQUESTIONMASK
Error:
File "generate_question.py", line 29, in <module>
REPR_TYPE.OPENAI_DEMOSTRATION_WFK,
AttributeError: type object 'REPR_TYPE has no attribute 'OPENAI_DEMOSTRATION_WFK'
Looking at the source code, it seems that I only see Spider using RAT SQL to calculate q-s linking code, but I do not see bird using RAT SQL to calculate q-s linking code.In the exposed code, bird is more like a few shot without q-s linking:
I would like to ask whether the result on the list uses RAT to calculate q-s linking? Thank you.
Hi!
Thanks for the paper and this code. Really insightful for Text2SQL work!
I want to clarify if my understanding of the setup as described below is correct:
Given the parameter messages
in a format as shown here: https://platform.openai.com/docs/api-reference/chat/create,
In your work for the role: system
, the content
is set as the default value of "You are a helpful assistant."
and the content
corresponding to role: user
is what is modified using the different prompt templates like ODp and others.
OR
Do you change the content
corresponding to role: system
to some other value?
I am a little dizzy about "data_preprocess.py". Could u please tell me roughly how it works?
I am interested in your research and admire your state-of-the-art results, but I have two questions:
First, I tested with "--selector_type EUCDISMASKPRESKLSIMTHR" and gpt-3.5-turbo. These are the results. I don’t understand why the execution accuracy is so low (72.3%). Could you please help me with this problem?
PS C:\Users\86158\Desktop\text2sql\test-suite-sql-eval-master\test-suite-sql-eval-master> python evaluation.py --gold dev_gold.txt --pred RESULTS_MODEL-gpt-3.5-turbo.txt --db C:\Users\86158\Desktop\text2sql\test-suite-sql-eval-master\test-suite-sql-eval-master\database --etype exec
OK
easy medium hard extra all
count 248 446 174 166 1034
===================== EXECUTION ACCURACY =====================
execution 0.883 0.771 0.661 0.422 0.723
Second, in the paper titled "DAIL-SQL," you use cosine similarity, but the algorithm in the corresponding code (EUCDISMASKPRESKLSIMTHR) uses Euclidean distance. Am I choosing the wrong algorithm?
在我运行python data_preprocess.py命令时,得到了如下报错:
Traceback (most recent call last):
File "data_preprocess.py", line 131, in
schema_linking_producer(spider_dev, spider_train, spider_table, spider_db, spider_dir)
File "data_preprocess.py", line 35, in schema_linking_producer
word_emb = GloVe(kind='42B', lemmatize=True)
File "/Users/minstrel./PythonProjects/DAIL-SQL-main/utils/pretrained_embeddings.py", line 46, in init
self.glove = torchtext.vocab.GloVe(name=kind, cache=cache)
AttributeError: module 'torchtext' has no attribute 'vocab'
请问要怎么解决?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.